Re: cause of dmesg call traces?

2017-08-28 Thread Nikolay Borisov


On 26.08.2017 23:30, Adam Bahe wrote:
> Hello all. Recently I added another 10TB sas drive to my btrfs array
> and I have received the following messages in dmesg during the
> balance. I was hoping someone could clarify what seems to be causing
> this.
> 
> Some additional info, I did a smartctl long test and one of my brand
> new 8TB drives warned me with this:
> 
> 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136
> # 5  Extended offlineCompleted: servo/seek failure 90%
> 474 0
> 
> Are the messages in dmesg caused by the issues with the hard drive, or
> something else entirely? A few months ago I had a total failure
> requiring a complete nuke and pave so I am trying to track down any
> potential issues aggressively and appreciate any help. Thanks!
> 
> Also, how many current_pending_sectors do you tolerate before you swap
> a drive? I am going to pull this drive as soon as this current balance
> finishes. But for future reference it would be good to keep an eye on.
> 
> 
> 
> [Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at
> fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017] Modules linked in: dm_mod rpcrdma ib_isert
> iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt
> target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm
> ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul
> ghash_clmulni_intel pcbc ext4 aesni_intel jbd2 crypto_simd mbcache
> glue_helper cryptd intel_cstate intel_rapl_perf ses enclosure pcspkr
> mei_me lpc_ich input_leds i2c_i801 joydev mfd_core mei sg ioatdma
> shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
> acpi_pad nfsd auth_rpcgss nfs_acl 8021q lockd garp grace mrp sunrpc
> ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
> i2c_algo_bit ata_generic
> 
> [Sat Aug 26 03:01:53 2017]  pata_acpi drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ata_piix mdio mpt3sas
> ptp raid_class pps_core libata scsi_transport_sas dca fjes
> 
> [Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5
> Tainted: GW   4.10.6-1.el7.elrepo.x86_64 #1

You are not even using upstream kernel, but some redhat-like derivative.
If you'd like to get support on this list, please test with an upstream
kernel otherwise all bets are off what kind of code you might be running.


> 
> [Sat Aug 26 03:01:53 2017] Hardware name: Supermicro Super
> Server/X10DRi-T4+, BIOS 2.0 12/17/2015
> 
> [Sat Aug 26 03:01:53 2017] Workqueue: writeback wb_workfn (flush-btrfs-2)
> 
> [Sat Aug 26 03:01:53 2017] Call Trace:
> 
> [Sat Aug 26 03:01:53 2017]  dump_stack+0x63/0x87
> 
> [Sat Aug 26 03:01:53 2017]  __warn+0xd1/0xf0
> 
> [Sat Aug 26 03:01:53 2017]  warn_slowpath_null+0x1d/0x20
> 
> [Sat Aug 26 03:01:53 2017]  btrfs_cross_ref_exist+0xd1/0xf0 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  run_delalloc_nocow+0x6e7/0xc00 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  ? test_range_bit+0xd0/0x160 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  run_delalloc_range+0x7d/0x3a0 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  ?
> find_lock_delalloc_range.constprop.56+0x1d1/0x200 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  writepage_delalloc.isra.48+0x10c/0x170 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  __extent_writepage+0xd6/0x2e0 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]
> extent_write_cache_pages.isra.44.constprop.59+0x2c4/0x480 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  extent_writepages+0x5c/0x90 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  ? btrfs_submit_direct+0x8b0/0x8b0 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  btrfs_writepages+0x28/0x30 [btrfs]
> 
> [Sat Aug 26 03:01:53 2017]  do_writepages+0x1e/0x30
> 
> [Sat Aug 26 03:01:53 2017]  __writeback_single_inode+0x45/0x330
> 
> [Sat Aug 26 03:01:53 2017]  writeback_sb_inodes+0x280/0x570
> 
> [Sat Aug 26 03:01:53 2017]  __writeback_inodes_wb+0x8c/0xc0
> 
> [Sat Aug 26 03:01:53 2017]  wb_writeback+0x276/0x310
> 
> [Sat Aug 26 03:01:53 2017]  wb_workfn+0x2e1/0x410
> 
> [Sat Aug 26 03:01:53 2017]  process_one_work+0x165/0x410
> 
> [Sat Aug 26 03:01:53 2017]  worker_thread+0x137/0x4c0
> 
> [Sat Aug 26 03:01:53 2017]  kthread+0x101/0x140
> 
> [Sat Aug 26 03:01:53 2017]  ? rescuer_thread+0x3b0/0x3b0
> 
> [Sat Aug 26 03:01:53 2017]  ? kthread_park+0x90/0x90
> 
> [Sat Aug 26 03:01:53 2017]  ret_from_fork+0x2c/0x40
> 
> [Sat Aug 26 03:01:53 2017] ---[ end trace 7ba8e3b5c60c322d ]---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: deleted subvols don't go away?

2017-08-28 Thread Nikolay Borisov


On 28.08.2017 06:43, Janos Toth F. wrote:
> ID=5 is the default, "root" or "toplevel" subvolume which can't be
> deleted anyway (at least normally, I am not sure if some debug-magic
> can achieve that).
> I just checked this (out of curiosity) and all my Btrfs filesystems
> report something very similar to yours (I thought DELETED was a made
> up example but I see it was literal...):
> 
> ~ # btrfs sub list -a /
> ID 303 gen 172881 top level 5 path /gentoo
> ~ # btrfs sub list -ad /
> ID 5 gen 172564 top level 0 path /DELETED

This seems to be coming form the userspace tools, specifically the
filter_and_sort_subvol() function. So this function in turn calls
resolve_root and if it returns -ENOENT, meaning it couldn't resolve a
root. Then DELETED is returned.

On a quick inspection of the code it seems that even for deleted
subvolumes btrfs still retains the ROOT_ITEM for the subvolume but since
all ROOT_BACKREF are deleted then the name of the tree cannot be
resolved (since it's stored in the root_backref). For example I did:

btrfs subvolume create /media/scratch/subvol1 && sync
btrfs inspect-internal dump-tree -t root /dev/vdc

item 14 key (258 ROOT_ITEM 0) itemoff 12972 itemsize 439
generation 11 root_dirid 256 bytenr 29949952 level 0 refs 1
lastsnap 0 byte_limit 0 bytes_used 16384 flags 0x0(none)
uuid 217fd861-4606-1146-b5ee-59fba8d37f8c
ctransid 11 otransid 10 stransid 0 rtransid 0
drop key (0 UNKNOWN.0 0) level 0

item 15 key (258 ROOT_BACKREF 5) itemoff 12947 itemsize 25
root backref key dirid 256 sequence 4 name subvol1

Afterwards, I deleted the subvolume:
btrfs subvolume delete -v /media/scratch/subvol1/ && sync

item 13 key (258 ROOT_ITEM 0) itemoff 12997 itemsize 439
generation 11 root_dirid 256 bytenr 29949952 level 0 refs 0
lastsnap 0 byte_limit 0 bytes_used 16384 flags 
0x1(none)
uuid 217fd861-4606-1146-b5ee-59fba8d37f8c
ctransid 11 otransid 10 stransid 0 rtransid 0
drop key (0 UNKNOWN.0 0) level 0



> 
> I guess this entry is some placeholder, like a hidden "trash"
> directory on some filesystems. I don't think this means all Btrfs
> filesystems forever hold on to their last deleted subvolumes (and only
> one).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of inline deduplication in btrfs

2017-08-28 Thread shally verma
On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowski  wrote:
> On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
>> The second has to do with btrfs scaling issues due to reflinking, which
>> of course is the operational mechanism for both snapshotting and dedup.
>> Snapshotting of course reflinks the entire subvolume, so it's reflinking
>> on a /massive/ scale.  While normal file operations aren't affected much,
>> btrfs maintenance operations such as balance and check scale badly enough
>> with snapshotting (due to the reflinking) that keeping the number of
>> snapshots per subvolume under 250 or so is strongly recommended, and
>> keeping them to double-digits or even single-digits is recommended if
>> possible.
>>
>> Dedup works by reflinking as well, but its effect on btrfs maintenance
>> will be far more variable, depending of course on how effective the
>> deduping, and thus the reflinking, is.  But considering that snapshotting
>> is effectively 100% effective deduping of the entire subvolume (until the
>> snapshot and active copy begin to diverge, at least), that tends to be
>> the worst case, so figuring a full two-copy dedup as equivalent to one
>> snapshot is a reasonable estimate of effect.  If dedup only catches 10%,
>> only once, than it would be 10% of a snapshot's effect.  If it's 10% but
>> there's 10 duplicated instances, that's the effect of a single snapshot.
>> Assuming of course that the dedup domain is the same as the subvolume
>> that's being snapshotted.

This looks to me a debate between using inline dedup Vs snapshotting
or more precisely, doing a dedupe via snapshots?
Did I understand it correct? if yes, does it mean people are still in
thoughts if current design and proposal to inline dedup
is right way to go for?

>
> Nope, snapshotting is not anywhere near the worst case of dedup:
>
> [/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +|
> cut -d' ' -f1|sort|uniq -c|sort -nr|head
>
> Even on the system parts (ie, ignoring my data) of my desktop, top files
> have the following dup counts: 532 384 373 164 123 122 101.  On this small
> SSD, the system parts are reflinked by snapshots with 10 dailies, and by
> deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc
> containers (chroots are mostly a zoo of different architectures).
>
> This is nothing compared to the backup server, which stores backups of 46
> machines (only system/user and small data, bulky stuff is backed up
> elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and
> yearly).  This worked well enough until I made the mistake of deduping the
> whole thing.
>
> But, this is still not the worst horror imaginable.  I'd recommend using
> whole-file dedup only as this avoids this pitfall: take two VM images, run
> block dedup on them.  Identical blocks in them will be cross-reflinked.  And
> there's _many_.  The vast majority of duplicate blocks are all-zero: I just
> ran fallocate -d on a 40G win10 VM and it shrank to 19G.  AFAIK
> file_extent_same is not yet smart enough to dedupe them to a hole instead.
>

Am bit confused over here, is your description based on offline-dedupe
here Or its with inline deduplication?

Thanks
Shally

>
> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
> ⠈⠳⣄
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: deleted subvols don't go away?

2017-08-28 Thread Christoph Anton Mitterer
Thanks...

Still a bit strange that it displays that entry... especially with a
generation that seems newer than what I thought was the actually last
generation on the fs.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: status of inline deduplication in btrfs

2017-08-28 Thread Adam Borowski
On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote:
> Am bit confused over here, is your description based on offline-dedupe
> here Or its with inline deduplication?

It doesn't matter _how_ you get to excessive reflinking, the resulting
slowdown is the same.

By the way, you can try "bees", it does nearline-dedupe which is for
practical purposes as good as fully online, and unlike the latter, has no
way to damage your data in case of bugs (mistaken userland dedupe can at
most make the kernel pointlessly read and compare data).

I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME
asynchronously right after a write gets put into the page cache, which in
most cases is quick enough to avoid writeout.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of inline deduplication in btrfs

2017-08-28 Thread Austin S. Hemmelgarn

On 2017-08-28 06:32, Adam Borowski wrote:

On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote:

Am bit confused over here, is your description based on offline-dedupe
here Or its with inline deduplication?


It doesn't matter _how_ you get to excessive reflinking, the resulting
slowdown is the same.

By the way, you can try "bees", it does nearline-dedupe which is for
practical purposes as good as fully online, and unlike the latter, has no
way to damage your data in case of bugs (mistaken userland dedupe can at
most make the kernel pointlessly read and compare data).

I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME
asynchronously right after a write gets put into the page cache, which in
most cases is quick enough to avoid writeout.
I would also recommend looking at 'bees'.  If you absolutely _must_ have 
online or near-online deduplication, then this is your best option 
currently from a data safety perspective.


That said, it's worth pointing out that in-line deduplication is not 
always the best answer.  In fact, it's quite often a sub-optimal answer 
compared to a combination of compression, sparse files, and batch 
deduplication.  Compression and usage of sparse files will get you about 
the same space savings most of the time as in-line deduplication (I've 
tested this on ZFS on FreeBSD using native in-line deduplication, and 
with BTRFS on Linux using bees) while using much less memory, and about 
the same amount of processor time.  In the event that you need better 
space savings than that, you're better off using batch deduplication 
because it gives you better control over when you're using more system 
resources and will often get better overall results than in-line 
deduplication.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: deleted subvols don't go away?

2017-08-28 Thread Nikolay Borisov


On 28.08.2017 11:07, Christoph Anton Mitterer wrote:
> Thanks...
> 
> Still a bit strange that it displays that entry... especially with a
> generation that seems newer than what I thought was the actually last
> generation on the fs.

Snapshot destroy is a 2-phase process. The first phase deletes just the
root references. After it you see what you've described. Then, later,
when the cleaner thread runs again the snapshot's root item is going to
be deleted for good and you no longer will see it.

> 
> Cheers,
> Chris.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: deleted subvols don't go away?

2017-08-28 Thread Hugo Mills
On Mon, Aug 28, 2017 at 03:03:47PM +0300, Nikolay Borisov wrote:
> 
> 
> On 28.08.2017 11:07, Christoph Anton Mitterer wrote:
> > Thanks...
> > 
> > Still a bit strange that it displays that entry... especially with a
> > generation that seems newer than what I thought was the actually last
> > generation on the fs.
> 
> Snapshot destroy is a 2-phase process. The first phase deletes just the
> root references. After it you see what you've described. Then, later,
> when the cleaner thread runs again the snapshot's root item is going to
> be deleted for good and you no longer will see it.

   It's worth noting also that if the subvol is still used in some way
(still mounted, nested subvol, processes with CWD in it, open files),
then it won't be cleaned up until the usage stops. Basically the same
behaviour as deleting a file. This could also explain the more recent
than expected generation values.

   Hugo.

-- 
Hugo Mills | "Big data" doesn't just mean increasing the font
hugo@... carfax.org.uk | size.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: deleted subvols don't go away?

2017-08-28 Thread Roman Mamedov
On Mon, 28 Aug 2017 15:03:47 +0300
Nikolay Borisov  wrote:

> when the cleaner thread runs again the snapshot's root item is going to
> be deleted for good and you no longer will see it.

Oh, that's pretty sweet -- it means there's actually a way to reliably wait
for cleaner work to be done on all deleted snapshots before unmounting the FS.
I was wondering about that recently for some transient filesystems (which get
mounted, synced to, snapshot-created/removed, then unmounted). Now can just
loop with a few second sleeps until `btrfs sub list -d $PATH` comes up empty.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs-progs: mkfs: add subvolume support to mkfs

2017-08-28 Thread Goffredo Baroncelli
Hi All,


unfortunately, your patch crashes on my PC

$ truncate -s 100G /tmp/disk.img
$ sudo losetup -f /tmp/disk.img

$ # good case
$ sudo ./mkfs.btrfs -f -r /tmp/empty/  /dev/loop0
btrfs-progs v4.12.1-1-gf80d059c
See http://btrfs.wiki.kernel.org for more information.

Making image is completed.
Label:  (null)
UUID:   7cb4927c-d24a-41b3-8151-277ad9064008
Node size:  16384
Sector size:4096
Filesystem size:28.00MiB
Block group profiles:
  Data: single   10.75MiB
  System:   DUP   4.00MiB
SSD detected:   no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   IDSIZE  PATH
128.00MiB  /dev/loop0

$ # bad case
$ sudo ./mkfs.btrfs -f -S prova -r /tmp/empty/  /dev/loop0
btrfs-progs v4.12.1-1-gf80d059c
See http://btrfs.wiki.kernel.org for more information.

ERROR: failed to create subvolume: -17
transaction.h:42: btrfs_start_transaction: BUG_ON 
`fs_info->running_transaction` triggered, value 884442943152
./mkfs.btrfs(+0x15674)[0xcdeb52c674]
./mkfs.btrfs(close_ctree_fs_info+0x313)[0xcdeb52e80f]
./mkfs.btrfs(main+0x1028)[0xcdeb52381e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f85a5d1f2e1]
./mkfs.btrfs(_start+0x2a)[0xcdeb520e9a]
Aborted


Below some further comments

On 08/28/2017 01:39 AM, Qu Wenruo wrote:
> 
> 
> On 2017年08月26日 07:21, Yingyi Luo wrote:
>> From: yingyil 
>>
>> Add -S/--subvol [NAME] option to configure. It enables users to create a
>> subvolume under the toplevel volume and populate the created subvolume
>> with files from the rootdir specified by -r/--rootdir option.
>>
>> Two functions link_subvol() and create_subvol() are moved from
>> convert/main.c to utils.c to enable code reuse.
> 
> What about split the patch as the code move of link/create_subvol() makes 
> review a little difficult.
> 
> BTW, if exporting link/create_subvol(), what about adding "btrfs_" prefix?
> 
> Thanks,
> Qu
>>
>> Signed-off-by: yingyil 
>> ---
[...]
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -365,6 +365,7 @@ static void print_usage(int ret)
>>   printf("  creation:\n");
>>   printf("\t-b|--byte-count SIZE    set filesystem size to SIZE (on the 
>> first device)\n");
>>   printf("\t-r|--rootdir DIR    copy files from DIR to the image 
>> root directory\n");
>> +    printf("\t-S|--subvol NAME    create a sunvolume with NAME and copy 
>> files from ROOTDIR to the subvolume\n");
>>   printf("\t-K|--nodiscard  do not perform whole device TRIM\n");
>>   printf("\t-f|--force  force overwrite of existing 
>> filesystem\n");
>>   printf("  general:\n");
>> @@ -413,6 +414,18 @@ static char *parse_label(const char *input)
>>   return strdup(input);
>>   }
>>   +static char *parse_subvol_name(const char *input)
>> +{
>> +    int len = strlen(input);
>> +
>> +    if (len >= BTRFS_SUBVOL_NAME_MAX) {
>> +    error("subvolume name %s is too long (max %d)",
>> +    input, BTRFS_SUBVOL_NAME_MAX - 1);
>> +    exit(1);
>> +    }
>> +    return strdup(input);

why use strdup ?

>> +}
>> +
[...]

>> @@ -1517,6 +1533,10 @@ int main(int argc, char **argv)
>>   PACKAGE_STRING);
>>   exit(0);
>>   break;
>> +    case 'S':
>> +    subvol_name = parse_subvol_name(optarg);
>> +    subvol_name_set = 1;
>> +    break;
>>   case 'r':
>>   source_dir = optarg;
>>   source_dir_set = 1;
>> @@ -1537,6 +1557,11 @@ int main(int argc, char **argv)
>>   }
>>   }
>> +    if (subvol_name_set && !source_dir_set) {
>> +    error("root directory needs to be set");
>> +    exit(1);
>> +    }
>> +

To me it seems reasonable to create an empty subvolume (below more comments)

>>   if (verbose) {
>>   printf("%s\n", PACKAGE_STRING);
>>   printf("See %s for more information.\n\n", PACKAGE_URL);
>> @@ -1876,10 +1901,48 @@ raid_groups:
>>   goto out;
>>   }
[...]

>>   ret = cleanup_temp_chunks(fs_info, , data_profile,
>> diff --git a/utils.c b/utils.c
>> index bb04913..c9bbbed 100644
>> --- a/utils.c
>> +++ b/utils.c
>> @@ -2574,3 +2574,164 @@ u8 rand_u8(void)
>>   void btrfs_config_init(void)
>>   {
>>   }
>> +
>> +struct btrfs_root *link_subvol(struct btrfs_root *root,
>> +    const char *base, u64 root_objectid)
>> +{
>> +    struct btrfs_trans_handle *trans;
[]

>> +
>> +    memcpy(buf, base, len);
>> +    for (i = 0; i < 1024; i++) {
>> +    ret = btrfs_insert_dir_item(trans, root, buf, len,
>> +    dirid, , BTRFS_FT_DIR, index);
>> +    if (ret != -EEXIST)
>> +    break;
>> +    len = snprintf(buf, ARRAY_SIZE(buf), "%s%d", base, i);
>> +    if (len < 1 || len > BTRFS_NAME_LEN) {
>> +    ret = -EINVAL;
>> +    break;
>> +    }

Re: slow btrfs with a single kworker process using 100% CPU

2017-08-28 Thread Stefan Priebe - Profihost AG
Hello,

a trace of the kworker looks like this:
   kworker/u24:4-13405 [003]  344186.202535: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202535: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202535:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202535: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202535:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202535: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202536:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202554: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202554: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202554: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202554: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202555: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202555:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202555: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202555:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202555: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202556:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202560: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202560: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202561: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202561: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202561: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202561:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202561: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202562: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202562: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202562: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202562: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202562: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202563: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202564: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202564:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202564: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202564:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202564: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202565:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202566: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202567: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202567: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202567: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202567: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202568:
block_group_cache_done.isra.27 <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202568: _raw_spin_lock
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202568:
btrfs_find_space_for_alloc <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202568: _raw_spin_lock
<-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202569:
tree_search_offset.isra.25 <-btrfs_find_space_for_alloc
   kworker/u24:4-13405 [003]  344186.202576: __get_raid_index
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202576: up_read <-find_free_extent
   kworker/u24:4-13405 [003]  344186.202577: btrfs_put_block_group
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202577: _cond_resched
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202577: down_read
<-find_free_extent
   kworker/u24:4-13405 [003]  344186.202577:

Re: status of inline deduplication in btrfs

2017-08-28 Thread Duncan
shally verma posted on Mon, 28 Aug 2017 12:49:10 +0530 as excerpted:

> On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowski 
> wrote:
>> On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
>>> The second has to do with btrfs scaling issues due to reflinking,
>>> which of course is the operational mechanism for both snapshotting and
>>> dedup.
>>> Snapshotting of course reflinks the entire subvolume, so it's
>>> reflinking on a /massive/ scale.  While normal file operations aren't
>>> affected much,
>>> btrfs maintenance operations such as balance and check scale badly
>>> enough with snapshotting (due to the reflinking) that keeping the
>>> number of snapshots per subvolume under 250 or so is strongly
>>> recommended, and keeping them to double-digits or even single-digits
>>> is recommended if possible.
>>>
>>> Dedup works by reflinking as well, but its effect on btrfs maintenance
>>> will be far more variable, depending of course on how effective the
>>> deduping, and thus the reflinking, is.  But considering that
>>> snapshotting is effectively 100% effective deduping of the entire
>>> subvolume (until the snapshot and active copy begin to diverge, at
>>> least), that tends to be the worst case, so figuring a full two-copy
>>> dedup as equivalent to one snapshot is a reasonable estimate of
>>> effect.
>>>  If dedup only catches 10%, only once, than it would be 10% of a
>>> snapshot's effect.  If it's 10% but there's 10 duplicated instances,
>>> that's the effect of a single snapshot. Assuming of course that the
>>> dedup domain is the same as the subvolume that's being snapshotted.
> 
> This looks to me a debate between using inline dedup Vs snapshotting or
> more precisely, doing a dedupe via snapshots?
> Did I understand it correct? if yes, does it mean people are still in
> thoughts if current design and proposal to inline dedup is right way to
> go for?

Not that I'm aware of and it wasn't my intent to leave that impression.

What I'm saying is that btrfs uses the same underlying mechanism, 
reflinking, for both snapshotting and dedup.

A rather limited but perhaps useful analogy from an /entirely/ different 
area might be that both single-person bicycles and full-size truck/
trailer rigs use the same underlying mechanism, wheels with tires turning 
against the ground, to move, while they have vastly different uses and 
neither one can replace the other.

And just as the common to both cases tire has the limitation that it can 
be punctured and go flat, that applies to both due to the common 
mechanism used to move, so reflinking has certain limitations that apply 
to both snapshotting and dedup, due to the common mechanism used in the 
implementation.

Of course taking the analogy much further than that will likely result in 
comically absurd conclusions, but hopefully when kept within its limits 
it's useful to convey my point, two technologies with very different 
usage at the surface level, taking advantage of a common implementation 
mechanism underneath.

And because the underlying mechanism is the same, its limits become the 
limits of both overlying solutions, however they otherwise differ.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs-progs: mkfs: add subvolume support to mkfs

2017-08-28 Thread Anand Jain




Add -S/--subvol [NAME] option to configure. It enables users to create a
subvolume under the toplevel volume 


> and populate the created subvolume
> with files from the rootdir specified by -r/--rootdir option.

 This brings two enhancements, those might be good ideas, but stating a 
specific use case will add the required clarity.


Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-28 Thread Marc MERLIN
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> > 
> > Can you look at this bug which has been happening since 2012 on apparently 
> > all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them 
> > all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at 
> > least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot 
> > rotations like me.
> 
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object 
> already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30 

So this still happens pseudo randomly every 2 weeks maybe?

Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.

Any update on this problem?

[ cut here ]  
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 
btrfs_run_delayed_refs+0xbd/0x1be  
BTRFS: Transaction aborted (error -17)  
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables 
ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci 
snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep 
eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis  
 snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote 
sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi 
soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus 
parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block 
mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy 
async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 
lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas 
xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas 
[last unloaded: asix]  
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G U  
4.9.36-amd64-preempt-sysrq-20170406 #1  
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013  
 b7eb67affc98 ae39b00b b7eb67affce8   
 b7eb67affcd8 ae066769 0b9767affd58 974f736da960  
 9756319df000 ffef 975302da7a50   
Call Trace:  
 [] dump_stack+0x61/0x7d  
 [] __warn+0xc2/0xdd  
 [] warn_slowpath_fmt+0x5a/0x76  
 [] btrfs_run_delayed_refs+0xbd/0x1be  
 [] commit_cowonly_roots+0x10d/0x2b2  
 [] ? btrfs_qgroup_account_extents+0x131/0x181  
 [] ? btrfs_run_delayed_refs+0x1a6/0x1be  
 [] btrfs_commit_transaction+0x46b/0x8fb  
 [] transaction_kthread+0xf5/0x1a1  
 [] ? btrfs_cleanup_transaction+0x436/0x436  
 [] kthread+0xd1/0xd9  
 [] ? init_completion+0x24/0x24  
 [] ? do_fast_syscall_32+0xb7/0xfe  
 [] ret_from_fork+0x25/0x30  
---[ end trace 4c5fcb9daa07c11a ]---  
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object 
already exists  
BTRFS info (device dm-2): forced readonly  
BTRFS warning (device dm-2): Skipping commit of aborted transaction.  
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object 
already exists  
BTRFS error (device dm-2): pending csums is 131072  

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html