Re: speed up cp --reflink=always
At 10/17/2016 02:50 PM, Stefan Priebe - Profihost AG wrote: Am 17.10.2016 um 03:50 schrieb Qu Wenruo: At 10/17/2016 02:54 AM, Stefan Priebe - Profihost AG wrote: Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: Hi, On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp Two quick thoughts: 1. How many extents does this img have? filefrag says: 1011508 extents found Too many fragments. Average extent size is only about 200K. Quite common for VM images, if not setting no copy-on-write (C) attr. Normally it's not a good idea to put VM images into btrfs without any tuning. Those are backups just written sequentially once. As far as i know the extent size is hardcoded to 128k for compression. Isn't it? Stefan Just as Duncan said, for compress, its extent size is limited to 128K, unless you prealloc the file, sequence write is not possible to create extent larger than 128K. Thanks, Qu Thanks, Qu 2. Is this an XY problem? Why not just put the img in a subvolume and snapshot that? Sorry what's XY problem? Implementing cp reflink was easier - as the original code was based on XFS. But shouldn't be cp reflink / clone a file be nearly identical to a snapshot? Just creating refs to the extents? Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
Stefan Priebe - Profihost AG posted on Mon, 17 Oct 2016 08:50:37 +0200 as excerpted: > Am 17.10.2016 um 03:50 schrieb Qu Wenruo: >> At 10/17/2016 02:54 AM, Stefan Priebe - Profihost AG wrote: >>> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: Hi, On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: > > cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) > > An example: > > source file: > # ls -la vm-279-disk-1.img > [...] 204010946560 Oct 14 12:15 vm-279-disk-1.img > > target file after around 10 minutes: > # ls -la vm-279-disk-1.img.tmp > [...] 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp Two quick thoughts: 1. How many extents does this img have? >>> >>> filefrag says: >>> 1011508 extents found >> >> Too many fragments. >> Average extent size is only about 200K. >> Quite common for VM images, if not setting no copy-on-write (C) attr. >> >> Normally it's not a good idea to put VM images into btrfs without any >> tuning. > > Those are backups just written sequentially once. As far as i know the > extent size is hardcoded to 128k for compression. Isn't it? I flagged that as I read it, too, but... 200 KB extents average suggests it can't be compressed, because if it were they'd be 128 KB extents, not 200 KB. That's a difference of over half a million extents (128 KB would be ~1.5565 million extents, while just over a million are reported), too much to be rounding error, so the evidence doesn't support btrfs compression being the culprit. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
Am 17.10.2016 um 03:50 schrieb Qu Wenruo: > At 10/17/2016 02:54 AM, Stefan Priebe - Profihost AG wrote: >> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: >>> Hi, >>> >>> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp >>> >>> Two quick thoughts: >>> 1. How many extents does this img have? >> >> filefrag says: >> 1011508 extents found > > Too many fragments. > Average extent size is only about 200K. > Quite common for VM images, if not setting no copy-on-write (C) attr. > > Normally it's not a good idea to put VM images into btrfs without any > tuning. Those are backups just written sequentially once. As far as i know the extent size is hardcoded to 128k for compression. Isn't it? Stefan > Thanks, > Qu >> >>> 2. Is this an XY problem? Why not just put the img in a subvolume and >>> snapshot that? >> >> Sorry what's XY problem? >> >> Implementing cp reflink was easier - as the original code was based on >> XFS. But shouldn't be cp reflink / clone a file be nearly identical to a >> snapshot? Just creating refs to the extents? >> >> Greets, >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
At 10/17/2016 02:54 AM, Stefan Priebe - Profihost AG wrote: Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: Hi, On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp Two quick thoughts: 1. How many extents does this img have? filefrag says: 1011508 extents found Too many fragments. Average extent size is only about 200K. Quite common for VM images, if not setting no copy-on-write (C) attr. Normally it's not a good idea to put VM images into btrfs without any tuning. Several default features of btrfs is not suitable for that use case: 1) Copy-on-Write For VM image, a lot of random write happens. This will create a lot of small extents, just as you see here. Traditional non-CoW filesystems, like Ext4 and (current) XFS, overwrite is just overwrite, won't be written into new places. So for these filesystems, no matter how many writes happen, the extent counts won't change much(mostly unchanged) 2) Extent booking Another result of CoW, data extents won't be freed until all its referencer get removed. Which leads to quite some space wastes. 3) Slow metadata operation Btfs tree cow and its lock mechanism makes metadata operation quite slow compared to other fs. Normal read/write is not metadata heavy operation, while reflinking is. (IIRC, xfs with reflink support, not mainlined yet, is faster than btrfs doing reflink) Normally, no cow (C) attr is recommended for VM image use case. This flag will make btrfs acts much like traditional fs, until there is a snapshot containing this file is created. While it has the limitation that it will prohibit reflink, you can't use cp --reflink=always then. If no cow flag is not what you want, and there is no other snapshot/subvolume/reflinked files sharing the file, defrag is high recommended before reflink. That will hugely reduce the number of extents(fragments) and reduce the time calling reflink. However I doubt the time consuming of defrag may be even longer than reflink. Thanks, Qu 2. Is this an XY problem? Why not just put the img in a subvolume and snapshot that? Sorry what's XY problem? Implementing cp reflink was easier - as the original code was based on XFS. But shouldn't be cp reflink / clone a file be nearly identical to a snapshot? Just creating refs to the extents? Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
On 10/16/2016 09:48 PM, Hans van Kranenburg wrote: > On 10/16/2016 08:54 PM, Stefan Priebe - Profihost AG wrote: >> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: >>> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp >>> >>> Two quick thoughts: >>> 1. How many extents does this img have? >> >> filefrag says: >> 1011508 extents found > > To cp --reflink this, the filesystem needs to create a million new > EXTENT_DATA objects for the new file, which point all parts of the new > file to all the little same parts of the old file, and probably also > needs to update a million EXTENT_DATA objects in the btrees to add a > second backreference back to the new file. Ehm, the second one is EXTENT_ITEM, not EXTENT_DATA. >>> 2. Is this an XY problem? Why not just put the img in a subvolume and >>> snapshot that? >> >> Sorry what's XY problem? > > It means that I suspected that your actual goal is not spending time to > work on optimizing how cp --reflink works, but that you just want to use > the quickest way to have a clone of the file. > > An XY problem is when someone has problem X, then thinks about solution > Y to solve it, then runs into a problem/limitation/whatever when trying > Y and asks help with that actual problem when doing Y while there might > in the end be a better solution to get X done. > >> Implementing cp reflink was easier - as the original code was based on >> XFS. But shouldn't be cp reflink / clone a file be nearly identical to a >> snapshot? Just creating refs to the extents? > > Snapshotting a subvolume only has to write a cowed copy of the top-level > information of the subvolume filesystem tree, and leaves the extent tree > alone. It doesn't have to do 2 million different things. \o/ > -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
Am 16.10.2016 um 21:48 schrieb Hans van Kranenburg: > On 10/16/2016 08:54 PM, Stefan Priebe - Profihost AG wrote: >> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: >>> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp >>> >>> Two quick thoughts: >>> 1. How many extents does this img have? >> >> filefrag says: >> 1011508 extents found > > To cp --reflink this, the filesystem needs to create a million new > EXTENT_DATA objects for the new file, which point all parts of the new > file to all the little same parts of the old file, and probably also > needs to update a million EXTENT_DATA objects in the btrees to add a > second backreference back to the new file. Thanks for this explanation. > >>> 2. Is this an XY problem? Why not just put the img in a subvolume and >>> snapshot that? >> >> Sorry what's XY problem? > > It means that I suspected that your actual goal is not spending time to > work on optimizing how cp --reflink works, but that you just want to use > the quickest way to have a clone of the file. > > An XY problem is when someone has problem X, then thinks about solution > Y to solve it, then runs into a problem/limitation/whatever when trying > Y and asks help with that actual problem when doing Y while there might > in the end be a better solution to get X done. ah ;-) makes sense. >> Implementing cp reflink was easier - as the original code was based on >> XFS. But shouldn't be cp reflink / clone a file be nearly identical to a >> snapshot? Just creating refs to the extents? > > Snapshotting a subvolume only has to write a cowed copy of the top-level > information of the subvolume filesystem tree, and leaves the extent tree > alone. It doesn't have to do 2 million different things. \o/ Thanks for this explanation. Will look into switching to subvolumes. Wasn't able todo this before as i was always running into ENOSPC issues which was solved last week. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
On 10/16/2016 08:54 PM, Stefan Priebe - Profihost AG wrote: > Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: >> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: >>> >>> cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) >>> >>> An example: >>> >>> source file: >>> # ls -la vm-279-disk-1.img >>> -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img >>> >>> target file after around 10 minutes: >>> # ls -la vm-279-disk-1.img.tmp >>> -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp >> >> Two quick thoughts: >> 1. How many extents does this img have? > > filefrag says: > 1011508 extents found To cp --reflink this, the filesystem needs to create a million new EXTENT_DATA objects for the new file, which point all parts of the new file to all the little same parts of the old file, and probably also needs to update a million EXTENT_DATA objects in the btrees to add a second backreference back to the new file. >> 2. Is this an XY problem? Why not just put the img in a subvolume and >> snapshot that? > > Sorry what's XY problem? It means that I suspected that your actual goal is not spending time to work on optimizing how cp --reflink works, but that you just want to use the quickest way to have a clone of the file. An XY problem is when someone has problem X, then thinks about solution Y to solve it, then runs into a problem/limitation/whatever when trying Y and asks help with that actual problem when doing Y while there might in the end be a better solution to get X done. > Implementing cp reflink was easier - as the original code was based on > XFS. But shouldn't be cp reflink / clone a file be nearly identical to a > snapshot? Just creating refs to the extents? Snapshotting a subvolume only has to write a cowed copy of the top-level information of the subvolume filesystem tree, and leaves the extent tree alone. It doesn't have to do 2 million different things. \o/ -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg: > Hi, > > On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: >> >> cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) >> >> An example: >> >> source file: >> # ls -la vm-279-disk-1.img >> -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img >> >> target file after around 10 minutes: >> # ls -la vm-279-disk-1.img.tmp >> -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp > > Two quick thoughts: > 1. How many extents does this img have? filefrag says: 1011508 extents found > 2. Is this an XY problem? Why not just put the img in a subvolume and > snapshot that? Sorry what's XY problem? Implementing cp reflink was easier - as the original code was based on XFS. But shouldn't be cp reflink / clone a file be nearly identical to a snapshot? Just creating refs to the extents? Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: speed up cp --reflink=always
Hi, On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote: > > cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) > > An example: > > source file: > # ls -la vm-279-disk-1.img > -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img > > target file after around 10 minutes: > # ls -la vm-279-disk-1.img.tmp > -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp Two quick thoughts: 1. How many extents does this img have? 2. Is this an XY problem? Why not just put the img in a subvolume and snapshot that? K -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
speed up cp --reflink=always
Hello, cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes) An example: source file: # ls -la vm-279-disk-1.img -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img target file after around 10 minutes: # ls -la vm-279-disk-1.img.tmp -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp I/O Waits are at around 6% but disk usage is at around 100%. The process using most of the disk I/O is a kworker process. A function trace of this kworker for 30s is already 44MB - no idea where to upload. This volume uses space_cache=v2. While digging through it i see a lot of this calls: kworker/u65:4-20679 [007] 46021.641882: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641882: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641883: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641884: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641885: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641886: btrfs_set_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641886: btrfs_get_token_32 <-btrfs_del_items kworker/u65:4-20679 [007] 46021.641886: btrfs_set_token_32 <-btrfs_del_items Sorting the calls shows: 4892 _raw_spin_lock <-free_extent_buffer 4894 release_extent_buffer <-free_extent_buffer 6803 map_private_extent_buffer <-generic_bin_search.constprop.36 6839 __set_page_dirty_nobuffers <-btree_set_page_dirty 6840 btree_set_page_dirty <-set_page_dirty 6840 mem_cgroup_begin_page_stat <-__set_page_dirty_nobuffers 6840 page_mapping <-set_page_dirty 6840 set_page_dirty <-set_extent_buffer_dirty 6841 mem_cgroup_end_page_stat <-__set_page_dirty_nobuffers 7521 btrfs_clear_lock_blocking_rw <-btrfs_clear_path_blocking 7967 btrfs_get_token_64 <-read_block_for_search.isra.33 8018 btrfs_set_token_32 <-btrfs_del_items 8235 btrfs_get_token_32 <-btrfs_del_items 8813 btrfs_set_lock_blocking_rw <-btrfs_set_path_blocking 9235 map_private_extent_buffer <-btrfs_get_token_32 11824 btrfs_set_token_32 <-btrfs_extend_item 12090 map_private_extent_buffer <-btrfs_get_token_64 12367 mark_page_accessed <-mark_extent_buffer_accessed 12621 btrfs_get_token_32 <-btrfs_extend_item 16267