Re: [PATCH v2] btrfs-progs: lowmem check: Fix false alert about file extent interrupt
On Mon, Jun 26, 2017 at 04:55:04PM +0200, David Sterba wrote: >On Thu, Jun 22, 2017 at 04:12:56PM +0800, Lu Fengqi wrote: >> As Qu mentioned in this thread >> (https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression >> can cause regular extent to co-exist with inlined extent. This coexistence >> makes things confusing. Since it was permitted currently, so fix >> btrfsck to prevent a bunch of error logs that will make user feel >> panic. >> >> When check file extent, record the extent_end of regular extent to check >> if there is a gap between the regular extents. Normally there is only one >> inlined extent, so the extent_end of inlined extent is useless. However, >> if regular extent can co-exist with inlined extent, the extent_end of >> inlined extent also need to record. >> >> Reported-by: Marc MERLIN>> Signed-off-by: Lu Fengqi > >Applied, thanks. > >Do you have a test for that? Yes, I have already posted this testcase (https://www.spinics.net/lists/linux-btrfs/msg66802.html) yesterday. In addition, this patch has an updated version (https://www.spinics.net/lists/linux-btrfs/msg66803.html) which make lowmem mode output more detailed information when file extent interrupt. Since the patch v2 has been applied, then I will send a patch for this modification alone. -- Thanks, Lu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lib/zstd: use div_u64() to let it build on 32-bit
Adam, I’ve applied the same patch in my tree. I’ll send out the update [1] once it's reviewed, since I also reduced the stack usage of functions using over 1 KB of stack space. You’re right that div_u64() will work, since the FSE functions are only called on blocks of at most 128 KB at a time. Perhaps a u32 would be clearer, but I would prefer to leave the signatures as is, to stay closer to upstream. Upstream FSE should work with sizes larger than 4 GB, but since it can't happen in zstd, it isn't a priority. I have userland tests set up mocking the linux kernel headers, and tested 32-bit mode there, but neglected to test the kernel on a 32-bit VM, which I’ve now corrected. Thanks for testing the patch on your ARM machine! [1] https://github.com/facebook/zstd/pull/738/files On 6/26/17, 9:18 PM, "Adam Borowski"wrote: David Sterba wrote: > > Thus, you want do_div() instead of /; do check widths and signedness of > > arguments. > > No do_div please, div_u64 or div64_u64. Good to know, the interface of do_div() is indeed weird. I guess Nick has found and fixed the offending divisions in his tree already, but this patch I'm sending is what I'm testing. One thing to note is that it divides u64 by size_t, so the actual operation differs on 32 vs 64-bit. Yet the code fails to handle compressing pieces bigger than 4GB in other places -- so use of size_t is misleading. Perhaps u32 would better convey this limitation? Anyway, that this code didn't even compile on 32-bit also means it hasn't been tested. I just happen to have such an ARM machine doing Debian archive rebuilds; I've rewritten the chroots with compress=zstd; this should be a nice non-artificial test. The load consists of snapshot+dpkg+gcc/etc+ assorted testsuites, two sbuild instances. Seems to work fine for a whole hour (yay!) already, let's see if there'll be any explosions. -- >8 >8 >8 >8 >8 >8 >8 >8 >8 -- Note that "total" is limited to 2³²-1 elsewhere despite being declared as size_t, so it's ok to use 64/32 -- it's much faster on eg. x86-32 than 64/64. Signed-off-by: Adam Borowski --- lib/zstd/fse_compress.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/zstd/fse_compress.c b/lib/zstd/fse_compress.c index e016bb177833..f59f9ebfe9c0 100644 --- a/lib/zstd/fse_compress.c +++ b/lib/zstd/fse_compress.c @@ -49,6 +49,7 @@ #include "fse.h" #include #include /* memcpy, memset */ +#include /* ** * Error Management @@ -575,7 +576,7 @@ static size_t FSE_normalizeM2(short *norm, U32 tableLog, const unsigned *count, { U64 const vStepLog = 62 - tableLog; U64 const mid = (1ULL << (vStepLog - 1)) - 1; - U64 const rStep = U64)1 << vStepLog) * ToDistribute) + mid) / total; /* scale on remaining */ + U64 const rStep = div_u64U64)1 << vStepLog) * ToDistribute) + mid, total); /* scale on remaining */ U64 tmpTotal = mid; for (s = 0; s <= maxSymbolValue; s++) { if (norm[s] == NOT_YET_ASSIGNED) { @@ -609,7 +610,7 @@ size_t FSE_normalizeCount(short *normalizedCounter, unsigned tableLog, const uns { U32 const rtbTable[] = {0, 473195, 504333, 520860, 55, 70, 75, 83}; U64 const scale = 62 - tableLog; - U64 const step = ((U64)1 << 62) / total; /* <== here, one division ! */ + U64 const step = div_u64((U64)1 << 62, total); /* <== here, one division ! */ U64 const vStep = 1ULL << (scale - 20); int stillToDistribute = 1 << tableLog; unsigned s; -- 2.13.1
[PATCH] lib/zstd: use div_u64() to let it build on 32-bit
David Sterba wrote: > > Thus, you want do_div() instead of /; do check widths and signedness of > > arguments. > > No do_div please, div_u64 or div64_u64. Good to know, the interface of do_div() is indeed weird. I guess Nick has found and fixed the offending divisions in his tree already, but this patch I'm sending is what I'm testing. One thing to note is that it divides u64 by size_t, so the actual operation differs on 32 vs 64-bit. Yet the code fails to handle compressing pieces bigger than 4GB in other places -- so use of size_t is misleading. Perhaps u32 would better convey this limitation? Anyway, that this code didn't even compile on 32-bit also means it hasn't been tested. I just happen to have such an ARM machine doing Debian archive rebuilds; I've rewritten the chroots with compress=zstd; this should be a nice non-artificial test. The load consists of snapshot+dpkg+gcc/etc+ assorted testsuites, two sbuild instances. Seems to work fine for a whole hour (yay!) already, let's see if there'll be any explosions. -- >8 >8 >8 >8 >8 >8 >8 >8 >8 -- Note that "total" is limited to 2³²-1 elsewhere despite being declared as size_t, so it's ok to use 64/32 -- it's much faster on eg. x86-32 than 64/64. Signed-off-by: Adam Borowski--- lib/zstd/fse_compress.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/zstd/fse_compress.c b/lib/zstd/fse_compress.c index e016bb177833..f59f9ebfe9c0 100644 --- a/lib/zstd/fse_compress.c +++ b/lib/zstd/fse_compress.c @@ -49,6 +49,7 @@ #include "fse.h" #include #include /* memcpy, memset */ +#include /* ** * Error Management @@ -575,7 +576,7 @@ static size_t FSE_normalizeM2(short *norm, U32 tableLog, const unsigned *count, { U64 const vStepLog = 62 - tableLog; U64 const mid = (1ULL << (vStepLog - 1)) - 1; - U64 const rStep = U64)1 << vStepLog) * ToDistribute) + mid) / total; /* scale on remaining */ + U64 const rStep = div_u64U64)1 << vStepLog) * ToDistribute) + mid, total); /* scale on remaining */ U64 tmpTotal = mid; for (s = 0; s <= maxSymbolValue; s++) { if (norm[s] == NOT_YET_ASSIGNED) { @@ -609,7 +610,7 @@ size_t FSE_normalizeCount(short *normalizedCounter, unsigned tableLog, const uns { U32 const rtbTable[] = {0, 473195, 504333, 520860, 55, 70, 75, 83}; U64 const scale = 62 - tableLog; - U64 const step = ((U64)1 << 62) / total; /* <== here, one division ! */ + U64 const step = div_u64((U64)1 << 62, total); /* <== here, one division ! */ U64 const vStep = 1ULL << (scale - 20); int stillToDistribute = 1 << tableLog; unsigned s; -- 2.13.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs-progs: btrfs-convert: Add larger device support
> > - u32 free_inodes_count; > > + u64 first_data_block; > > + u64 block_count; > > + u64 inodes_count; > > + u64 free_inodes_count; > > I've split this change from the patch as it does not logically belong to > the same patch, altough the change is simple. Okay sure, thanks. Cheers. Lakshmipathi.G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3.1 0/7] Chunk level degradable check
At 06/27/2017 09:59 AM, Anand Jain wrote: On 06/27/2017 09:05 AM, Qu Wenruo wrote: At 06/27/2017 02:59 AM, David Sterba wrote: On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote: Btrfs currently uses num_tolerated_disk_barrier_failures to do global check for tolerated missing device. Although the one-size-fit-all solution is quite safe, it's too strict if data and metadata has different duplication level. For example, if one use Single data and RAID1 metadata for 2 disks, it means any missing device will make the fs unable to be degraded mounted. But in fact, some times all single chunks may be in the existing device and in that case, we should allow it to be rw degraded mounted. Such case can be easily reproduced using the following script: # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc # wipefs -f /dev/sdc # mount /dev/sdb -o degraded,rw If using btrfs-debug-tree to check /dev/sdb, one should find that the data chunk is only in sdb, so in fact it should allow degraded mount. This patchset will introduce a new per-chunk degradable check for btrfs, allow above case to succeed, and it's quite small anyway. And enhance kernel error message for missing device, at least kernel can know what's making mount failed, other than meaningless "failed to read system chunk/chunk tree -5". I'd like to get this merged to 4.14. The flush bio changes are now done, so the base code should be stable. I've read the previous iterations of this patchset, the comments and user feedback. The usecase coverage seems to be good and what users expect. Thank you for the kindly remind. There are some bits in the implementation that I do not like, eg. reintroducing memory allocation failure to the barrier check, but IIRC no fundamental problems. Please refresh the patchset on top of current code that's going to 4.13 (equvalent to the current for-next), I'll review that and comment. One or more iterations might be needed, but 4.14 target is within reach. I'll check the new flush infrastructure and figure out if we can avoid re-introducing such memory allocation failure with the new infrastructure. As this is going to address the raid1 availability issue, its better to mark this for the stable. IMO. But I wonder if there is any objection ? Not sure if stable maintainers (even normal subsystem maintainers) will like it, as it's quite a large modification, including dev flush infrastructure. But since v4.14 will be an LTS kernel, we don't need to rush too much to push this feature to stable, as long as the feature is planned to reach v4.14. Thanks, Qu Thanks, -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent
At 06/27/2017 02:02 AM, Liu Bo wrote: On Mon, Jun 26, 2017 at 04:09:53PM +0200, David Sterba wrote: On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote: From: Liu BoAh, my From was broken again. %search_start is calculated in a wrong way, and if %ins is a cross-stripe one, it'll search the same block group forever. That's a bit terse description, so please check if my understanding is right: search_start advances by at least one stripe len, but the math would be wrong as using bg_offset would not move us to the next stripe. bg_cache->key.objectid is the full length so this will reach the next stripe and will not loop forever. Yes, it's correct, the code's logic is like, now that the returned %ins is a cross-stripe one, it then calculates a BTRFS_STRIPE_LEN aligned one as the new %search_start and see if there is any free block matching %search_start. The current code is using a wrong offset, the offset really should be the start position of a block group. Do you happen to have a test for that? Unfortunately it's not a test with vanilla progs. I found this when mkfs.btrfs with a 12K nodesize, but now kernel has a power_of_2 limitation for nodesize and progs code is using a weird IS_ALIGNED() Yes, btrfs_check_nodesize() is using (nodesize & (sectorsize - 1)) to check if it's aligned, but it's only correct if sectorsize is power of 2. It should also be fixed for btrfs-progs. Thanks, Qu which has the same effect with power_of_2(), mkfs.btrfs -n 12K is not allowed. I changed IS_ALIGNED() to (blocksize % nodesize != 0) and got the above loop. Signed-off-by: Liu Bo --- extent-tree.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index b12ee29..5e09274 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -2614,8 +2614,9 @@ check_failed: goto no_bg_cache; bg_offset = ins->objectid - bg_cache->key.objectid; - search_start = round_up(bg_offset + num_bytes, - BTRFS_STRIPE_LEN) + bg_offset; + search_start = round_up( + bg_offset + num_bytes, BTRFS_STRIPE_LEN) + + bg_cache->key.object; extent-tree.c: In function ‘find_free_extent’: extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named ‘object’; did you mean ‘objectid’? bg_cache->key.object; ^ Ouch, that's right, it's %objectid. I'll send a updated one, thanks for the comments. -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3.1 0/7] Chunk level degradable check
On 06/27/2017 09:05 AM, Qu Wenruo wrote: At 06/27/2017 02:59 AM, David Sterba wrote: On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote: Btrfs currently uses num_tolerated_disk_barrier_failures to do global check for tolerated missing device. Although the one-size-fit-all solution is quite safe, it's too strict if data and metadata has different duplication level. For example, if one use Single data and RAID1 metadata for 2 disks, it means any missing device will make the fs unable to be degraded mounted. But in fact, some times all single chunks may be in the existing device and in that case, we should allow it to be rw degraded mounted. Such case can be easily reproduced using the following script: # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc # wipefs -f /dev/sdc # mount /dev/sdb -o degraded,rw If using btrfs-debug-tree to check /dev/sdb, one should find that the data chunk is only in sdb, so in fact it should allow degraded mount. This patchset will introduce a new per-chunk degradable check for btrfs, allow above case to succeed, and it's quite small anyway. And enhance kernel error message for missing device, at least kernel can know what's making mount failed, other than meaningless "failed to read system chunk/chunk tree -5". I'd like to get this merged to 4.14. The flush bio changes are now done, so the base code should be stable. I've read the previous iterations of this patchset, the comments and user feedback. The usecase coverage seems to be good and what users expect. Thank you for the kindly remind. There are some bits in the implementation that I do not like, eg. reintroducing memory allocation failure to the barrier check, but IIRC no fundamental problems. Please refresh the patchset on top of current code that's going to 4.13 (equvalent to the current for-next), I'll review that and comment. One or more iterations might be needed, but 4.14 target is within reach. I'll check the new flush infrastructure and figure out if we can avoid re-introducing such memory allocation failure with the new infrastructure. As this is going to address the raid1 availability issue, its better to mark this for the stable. IMO. But I wonder if there is any objection ? Thanks, -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: convert: do not clear header rev
At 06/27/2017 07:55 AM, Liu Bo wrote: So btrfs_set_header_flags() vs btrfs_set_header_flag, the difference is sort of similar to "=" vs "|=", when creating and initialising a new extent buffer, convert uses the former one which clears header_rev by accident. Thanks for catching this one. Reviewed-by: Qu WenruoThanks, Qu Signed-off-by: Liu Bo --- convert/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/convert/common.c b/convert/common.c index 40bf32c..f0dd2cf 100644 --- a/convert/common.c +++ b/convert/common.c @@ -167,7 +167,7 @@ static int setup_temp_extent_buffer(struct extent_buffer *buf, btrfs_set_header_generation(buf, 1); btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(buf, owner); - btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN); + btrfs_set_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN); write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf), BTRFS_UUID_SIZE); write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3.1 0/7] Chunk level degradable check
At 06/27/2017 02:59 AM, David Sterba wrote: On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote: Btrfs currently uses num_tolerated_disk_barrier_failures to do global check for tolerated missing device. Although the one-size-fit-all solution is quite safe, it's too strict if data and metadata has different duplication level. For example, if one use Single data and RAID1 metadata for 2 disks, it means any missing device will make the fs unable to be degraded mounted. But in fact, some times all single chunks may be in the existing device and in that case, we should allow it to be rw degraded mounted. Such case can be easily reproduced using the following script: # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc # wipefs -f /dev/sdc # mount /dev/sdb -o degraded,rw If using btrfs-debug-tree to check /dev/sdb, one should find that the data chunk is only in sdb, so in fact it should allow degraded mount. This patchset will introduce a new per-chunk degradable check for btrfs, allow above case to succeed, and it's quite small anyway. And enhance kernel error message for missing device, at least kernel can know what's making mount failed, other than meaningless "failed to read system chunk/chunk tree -5". I'd like to get this merged to 4.14. The flush bio changes are now done, so the base code should be stable. I've read the previous iterations of this patchset, the comments and user feedback. The usecase coverage seems to be good and what users expect. Thank you for the kindly remind. There are some bits in the implementation that I do not like, eg. reintroducing memory allocation failure to the barrier check, but IIRC no fundamental problems. Please refresh the patchset on top of current code that's going to 4.13 (equvalent to the current for-next), I'll review that and comment. One or more iterations might be needed, but 4.14 target is within reach. I'll check the new flush infrastructure and figure out if we can avoid re-introducing such memory allocation failure with the new infrastructure. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: convert: do not clear header rev
So btrfs_set_header_flags() vs btrfs_set_header_flag, the difference is sort of similar to "=" vs "|=", when creating and initialising a new extent buffer, convert uses the former one which clears header_rev by accident. Signed-off-by: Liu Bo--- convert/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/convert/common.c b/convert/common.c index 40bf32c..f0dd2cf 100644 --- a/convert/common.c +++ b/convert/common.c @@ -167,7 +167,7 @@ static int setup_temp_extent_buffer(struct extent_buffer *buf, btrfs_set_header_generation(buf, 1); btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(buf, owner); - btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN); + btrfs_set_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN); write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf), BTRFS_UUID_SIZE); write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v3 0/2] Btrfs: add compression heuristic
Today btrfs use simple logic to make decision compress data or not: Selected compression algorithm try compress data and if this save some space store that extent as compressed. It's Reliable way to detect uncompressible data but it's will waste/burn cpu time for bad/un-compressible data and add latency. This way also add additional pressure on memory subsystem as for every compressed write btrfs need to allocate some buffered pages and reuse compression workspace. This is quite efficient, but not free. So, try create basic heuristic framework, this heuristic code will analizy data on the fly before call of compression code, can detect uncompressible data and advice to skip it. I leave comments with description in code, but i also will try describe that logic short. Heuristic have several internal layers: 1. Get sample data - this is cpu expensive to analize whole stream, so let's get some big enough sample from input data Scaling: In data: 128K 64K 32K 4K Sample: 4096b 3072b 2048b 1024b 2. For performance reason and for reuse it in 7th level copy selected data to sample buffer 3. Count every byte type in sample buffer 4. Count how many types of bytes we find If it's not many - data will be easy compressible 5. Count character core set size, i.e. which characters use 90% of input stream If core set small (1-50 different types) Data easy compressible If big (200-256) - data probably can't be compressed 6. If above methods are fail to make decision, try compute shannon entropy If entropy are small - data will be easy compressible If not - go to 7th 7. Entropy can't detect repeated strings of bytes So try look at the data for detect repeated bytes Compute a difference between frequency of bytes from coreset and between frequency of pair of that bytes If sum of that defferent from zero and entropy and not big, give compression code a try If entropy are High 7.2/8 - 8/8 (> 90%), and if we find BIG enough difference between frequency of a pairs and characters Give compression code a try 7th level needed for decreasing false negative returns, where data can be compressed (like ~131072b -> ~87000b ~ 0.66), but not so easy. That code, as i see, forbidden compression like: - 131072b -> ~11b If compression ratio are better, it's allow that. Shannon entropy use log2(a/b) function, I did a try replace that with int_log2(a)-int_log2(b), but integer realization of log2 show a lack of accuracy (+-7-10%) in our case. So i precalculate some input/output values (1/131072 - 1/1) and create log2_lshift16(); I already decrease lines of that function from 1200 -> 200 for save memory (and lose some accuracy), so with precomputed function I get +- 0.5-2% of accuracy (in compare to normal "true" float log2 shannon) Thanks. Patches based on latest mainline: v4.12-rc7 P.S. I made only stability tests at now, all works stable. About performance: In userspace realization of that algorithm, which iterate over data by 128kb block and do Mmap() of file, it show ~4GiB/s over in memory (cached) data in one stream. For i5-4200M && DDR3. So i expect to not hurt compression performance. I've also duplicate patch set to: https://github.com/Nefelim4ag/linux log2_lshift() - tested by log2_generator https://github.com/Nefelim4ag/Entropy_Calculation P.S.S. Sorry for my bad english and may be for ugly code. I do my best, thanks. Changes since v1: - Fixes of checkpatch.pl warnings/errors - Use div64_u64() instead of "/" - Make log2_lshift16() more like binary tree as suggested by: Adam BorowskiChanges since v2: - Fix page read address overflow in heuristic.c - Make "bucket" dynamically allocated, for fix warnings about big stack. - Small cleanups Timofey Titovets (2): Btrfs: add precomputed log2() Btrfs: add heuristic method for make decision compress or not compress fs/btrfs/Makefile| 2 +- fs/btrfs/heuristic.c | 275 ++ fs/btrfs/heuristic.h | 13 +++ fs/btrfs/inode.c | 37 --- fs/btrfs/log2_lshift16.c | 278 +++ fs/btrfs/log2_lshift16.h | 11 ++ 6 files changed, 601 insertions(+), 15 deletions(-) create mode 100644 fs/btrfs/heuristic.c create mode 100644 fs/btrfs/heuristic.h create mode 100644 fs/btrfs/log2_lshift16.c create mode 100644 fs/btrfs/log2_lshift16.h -- 2.13.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v3 1/2] Btrfs: add precomputed log2()
Heuristic code compute shannon entropy in cases when other methods can't make clear decision For realization that calculation it's needs floating point, but as this doesn't possible to use floating point, lets just precalculate all our input/output values Signed-off-by: Timofey Titovets--- fs/btrfs/log2_lshift16.c | 278 +++ fs/btrfs/log2_lshift16.h | 11 ++ 2 files changed, 289 insertions(+) create mode 100644 fs/btrfs/log2_lshift16.c create mode 100644 fs/btrfs/log2_lshift16.h diff --git a/fs/btrfs/log2_lshift16.c b/fs/btrfs/log2_lshift16.c new file mode 100644 index ..0d5d414b2adf --- /dev/null +++ b/fs/btrfs/log2_lshift16.c @@ -0,0 +1,278 @@ +#include +#include "log2_lshift16.h" + +/* + * Precalculated log2 values + * Shifting used for avoiding floating point + * Fraction must be left shifted by 16 + * Return of log are left shifted by 3 + */ +int log2_lshift16(u64 lshift16) +{ + if (lshift16 < 558) { + if (lshift16 < 54) { + if (lshift16 < 13) { + if (lshift16 < 7) { + if (lshift16 < 1) + return -136; + if (lshift16 < 2) + return -123; + if (lshift16 < 3) + return -117; + if (lshift16 < 4) + return -113; + if (lshift16 < 5) + return -110; + if (lshift16 < 6) + return -108; + if (lshift16 < 7) + return -106; + } else { + if (lshift16 < 8) + return -104; + if (lshift16 < 9) + return -103; + if (lshift16 < 10) + return -102; + if (lshift16 < 11) + return -100; + if (lshift16 < 12) + return -99; + if (lshift16 < 13) + return -98; + } + } else { + if (lshift16 < 29) { + if (lshift16 < 15) + return -97; + if (lshift16 < 16) + return -96; + if (lshift16 < 17) + return -95; + if (lshift16 < 19) + return -94; + if (lshift16 < 21) + return -93; + if (lshift16 < 23) + return -92; + if (lshift16 < 25) + return -91; + if (lshift16 < 27) + return -90; + if (lshift16 < 29) + return -89; + } else { + if (lshift16 < 32) + return -88; + if (lshift16 < 35) + return -87; + if (lshift16 < 38) + return -86; + if (lshift16 < 41) + return -85; + if (lshift16 < 45) + return -84; + if (lshift16 < 49) + return -83; + if (lshift16 < 54) + return -82; + } + } + } else { + if (lshift16 < 181) { + if (lshift16 < 99) { +
[RFC PATCH v3 2/2] Btrfs: add heuristic method for make decision compress or not compress
Add a heuristic computation before compression, for avoiding load resource heavy compression workspace, if data are probably can't be compressed. Signed-off-by: Timofey Titovets--- fs/btrfs/Makefile| 2 +- fs/btrfs/heuristic.c | 275 +++ fs/btrfs/heuristic.h | 13 +++ fs/btrfs/inode.c | 37 --- 4 files changed, 312 insertions(+), 15 deletions(-) create mode 100644 fs/btrfs/heuristic.c create mode 100644 fs/btrfs/heuristic.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 128ce17a80b0..8386095c9032 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ - uuid-tree.o props.o hash.o free-space-tree.o + uuid-tree.o props.o hash.o free-space-tree.o heuristic.o log2_lshift16.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c new file mode 100644 index ..cac6f0917b59 --- /dev/null +++ b/fs/btrfs/heuristic.c @@ -0,0 +1,275 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "heuristic.h" +/* Precalculated log2 realization */ +#include "log2_lshift16.h" + +/* For shannon full integer entropy calculation */ +#define BUCKET_SIZE (1 << 8) + +struct _backet_item { + u8 padding; + u8 symbol; + u16 count; +}; + + +/* For sorting */ +static int compare(const void *lhs, const void *rhs) +{ + struct _backet_item *l = (struct _backet_item *)(lhs); + struct _backet_item *r = (struct _backet_item *)(rhs); + + return r->count - l->count; +} + +/* + * For good compressible data + * symbol set size over sample + * will be small <= 64 + */ +static u32 _symbset_calc(const struct _backet_item *bucket) +{ + u32 a = 0; + u32 symbset_size = 0; + + for (; a < BUCKET_SIZE && symbset_size <= 64; a++) { + if (bucket[a].count) + symbset_size++; + } + return symbset_size; +} + + +/* + * Try calculate coreset size + * i.e. how many symbols use 90% of input data + * < 50 - good compressible data + * > 200 - bad compressible data + * For right & fast calculation bucket must be reverse sorted + */ +static u32 _coreset_calc(const struct _backet_item *bucket, + const u32 sum_threshold) +{ + u32 a = 0; + u32 coreset_sum = 0; + + for (a = 0; a < 201 && bucket[a].count; a++) { + coreset_sum += bucket[a].count; + if (coreset_sum > sum_threshold) + break; + } + return a; +} + +static u64 _entropy_perc(const struct _backet_item *bucket, + const u32 sample_size) +{ + u64 a, p; + u64 entropy_sum = 0; + u64 entropy_max = LOG2_RET_SHIFT*8; + + for (a = 0; a < BUCKET_SIZE && bucket[a].count > 0; a++) { + p = bucket[a].count; + p = div64_u64(p*LOG2_ARG_SHIFT, sample_size); + entropy_sum += -p*log2_lshift16(p); + } + + entropy_sum = div64_u64(entropy_sum, LOG2_ARG_SHIFT); + return div64_u64(entropy_sum*100, entropy_max); +} + +/* Pair distance from random distribution */ +static u64 _random_pairs_distribution(const struct _backet_item *bucket, + const u32 coreset_size, const u8 *sample, u32 sample_size) +{ + u32 a, b; + u8 pair_a[2], pair_b[2]; + u32 pairs_count; + u64 sum = 0; + u64 buf1, buf2; + + for (a = 0; a < coreset_size-1; a++) { + pairs_count = 0; + pair_a[0] = bucket[a].symbol; + pair_a[1] = bucket[a+1].symbol; + pair_b[1] = bucket[a].symbol; + pair_b[0] = bucket[a+1].symbol; + for (b = 0; b < sample_size-1; b++) { + u16 *pair_c = (u16 *) [b]; + + if (pair_c == (u16 *) pair_a) + pairs_count++; + else if (pair_c == (u16 *) pair_b) + pairs_count++; + } + buf1 = bucket[a].count*bucket[a+1].count; + buf1 = div64_u64(buf1*10, (sample_size*sample_size)); + buf2 = pairs_count*2*10; + buf2 = div64_u64(pairs_count, sample_size); + sum += (buf1 - buf2)*(buf1 - buf2); + } + + return div64_u64(sum, 2048); +} + +/* + * Algorithm description + * 1. Get subset of data for fast computation + * 2. Scan bucket for symbol set + *- symbol set < 64 - data will be easy compressible, return + * 3. Try compute coreset size
Re: [PATCH v3.1 0/7] Chunk level degradable check
On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote: > Btrfs currently uses num_tolerated_disk_barrier_failures to do global > check for tolerated missing device. > > Although the one-size-fit-all solution is quite safe, it's too strict > if data and metadata has different duplication level. > > For example, if one use Single data and RAID1 metadata for 2 disks, it > means any missing device will make the fs unable to be degraded > mounted. > > But in fact, some times all single chunks may be in the existing > device and in that case, we should allow it to be rw degraded mounted. > > Such case can be easily reproduced using the following script: > # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc > # wipefs -f /dev/sdc > # mount /dev/sdb -o degraded,rw > > If using btrfs-debug-tree to check /dev/sdb, one should find that the > data chunk is only in sdb, so in fact it should allow degraded mount. > > This patchset will introduce a new per-chunk degradable check for > btrfs, allow above case to succeed, and it's quite small anyway. > > And enhance kernel error message for missing device, at least kernel > can know what's making mount failed, other than meaningless > "failed to read system chunk/chunk tree -5". I'd like to get this merged to 4.14. The flush bio changes are now done, so the base code should be stable. I've read the previous iterations of this patchset, the comments and user feedback. The usecase coverage seems to be good and what users expect. There are some bits in the implementation that I do not like, eg. reintroducing memory allocation failure to the barrier check, but IIRC no fundamental problems. Please refresh the patchset on top of current code that's going to 4.13 (equvalent to the current for-next), I'll review that and comment. One or more iterations might be needed, but 4.14 target is within reach. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch v2] Btrfs-progs: fix infinite loop in find_free_extent
If the found %ins is crossing a stripe len, ie. BTRFS_STRIPE_LEN, we'd search again with a stripe-aligned %search_start. The current code calculates %search_start by adding a wrong offset, in order to fix it, the start position of the block group should be taken, otherwise, it'll end up with looking at the same block group forever. Cc: David SterbaSigned-off-by: Liu Bo --- v2: - enhance commit log with more details. - fix typo on bg_cache->key.objectid. extent-tree.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index 3e32e43..2c73d46 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -2614,8 +2614,9 @@ check_failed: goto no_bg_cache; bg_offset = ins->objectid - bg_cache->key.objectid; - search_start = round_up(bg_offset + num_bytes, - BTRFS_STRIPE_LEN) + bg_offset; + search_start = round_up( + bg_offset + num_bytes, BTRFS_STRIPE_LEN) + + bg_cache->key.objectid; goto new_group; } no_bg_cache: -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Apply for a loan at 3%
Apply for a loan at 3% reply to this Email for more Info -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: incremental send, fix invalid path for link commands
From: Filipe MananaIn some scenarios an incremental send stream can contain link commands with an invalid target path. Such scenarios happen after moving some directory inode A, renaming a regular file inode B into the old name of inode A and finally creating a new hard link for inode B at directory inode A. Consider the following example scenario where this issue happens. Parent snapshot: . (ino 256) | |--- dir1/ (ino 257) | |--- dir2/ (ino 258) | |--- dir3/ (ino 259) | |--- file1 (ino 261) | |--- dir4/ (ino 262) | |--- dir5/ (ino 260) Send snapshot: . (ino 256) | |--- dir1/ (ino 257) |--- dir2/ (ino 258) | |--- dir3/ (ino 259) ||--- dir4 (ino 261) | |--- dir6/ (ino 263) |--- dir44/ (ino 262) |--- file11 (ino 261) |--- dir55/ (ino 260) When attempting to apply the corresponding incremental send stream, a link command contains an invalid target path which makes the receiver fail. The following is the verbose output of the btrfs receive command: receiving snapshot mysnap2 uuid=90076fe6-5ba6-e64a-9321-9279670ed16b (...) utimes utimes dir1 utimes dir1/dir2/dir3 utimes rename dir1/dir2/dir3/dir4 -> o262-7-0 link dir1/dir2/dir3/dir4 -> dir1/dir2/dir3/file1 link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1 ERROR: link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1 failed: Not a directory The following steps happen during the computation of the incremental send stream the lead to this issue: 1) When processing inode 261, we orphanize inode 262 due to a name/location collision with one of the new hard links for inode 261 (created in the second step below). 2) We create one of the 2 new hard links for inode 261, the one whose location is at "dir1/dir2/dir3/dir4". 3) We then attempt to create the other new hard link for inode 261, which has inode 262 as its parent directory. Because the path for this new hard link was computed before we started processing the new references (hard links), it reflects the old name/location of inode 262, that is, it does not account for the orphanization step that happened when we started processing the new references for inode 261, whence it is no longer valid, causing the receiver to fail. So fix this issue by recomputing the full path of new references if we ended up orphanizing other inodes which are directories. A test case for fstests follows soon. Signed-off-by: Filipe Manana --- Applies on top of previous patches: Btrfs: send, fix invalid path after renaming and linking file Btrfs: incremental send, fix invalid path for unlink commands fs/btrfs/send.c | 81 - 1 file changed, 51 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e937c10b8287..7eaccfb72b47 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1856,7 +1856,7 @@ static int is_first_ref(struct btrfs_root *root, */ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen, const char *name, int name_len, - u64 *who_ino, u64 *who_gen) + u64 *who_ino, u64 *who_gen, u64 *who_mode) { int ret = 0; u64 gen; @@ -1905,7 +1905,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen, if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) { ret = get_inode_info(sctx->parent_root, other_inode, NULL, - who_gen, NULL, NULL, NULL, NULL); + who_gen, who_mode, NULL, NULL, NULL); if (ret < 0) goto out; @@ -3683,6 +3683,36 @@ static int wait_for_parent_move(struct send_ctx *sctx, return ret; } +static int update_ref_path(struct send_ctx *sctx, struct recorded_ref *ref) +{ + int ret; + struct fs_path *new_path; + + /* +* Our reference's name member points to its full_path member string, so +* we use here a new path. +*/ + new_path = fs_path_alloc(); + if (!new_path) + return -ENOMEM; + + ret =
[PATCH] btrfs: test incremental send after replacing directory with a file
From: Filipe MananaTest that an incremental send/receive operation works correctly after moving some directory inode A, renaming a regular file inode B into the old name of inode A and finally creating a new hard link for inode B at directory inode A. This issue is fixed by the following patch for the linux kernel: "Btrfs: incremental send, fix invalid path for link commands" Signed-off-by: Filipe Manana --- tests/btrfs/147 | 130 tests/btrfs/147.out | 6 +++ tests/btrfs/group | 1 + 3 files changed, 137 insertions(+) create mode 100755 tests/btrfs/147 create mode 100644 tests/btrfs/147.out diff --git a/tests/btrfs/147 b/tests/btrfs/147 new file mode 100755 index ..15517b0c --- /dev/null +++ b/tests/btrfs/147 @@ -0,0 +1,130 @@ +#! /bin/bash +# FS QA Test No. btrfs/147 +# +# Test that an incremental send/receive operation works correctly after moving +# some directory inode A, renaming a regular file inode B into the old name of +# inode A and finally creating a new hard link for inode B at directory inode A. +# +#--- +# +# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_test +_require_scratch +_require_fssum + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +mkdir $SCRATCH_MNT/dir1 +mkdir $SCRATCH_MNT/dir1/dir2 +mkdir $SCRATCH_MNT/dir1/dir2/dir3 +mkdir $SCRATCH_MNT/dir5 +touch $SCRATCH_MNT/dir1/dir2/dir3/file1 +mkdir $SCRATCH_MNT/dir1/dir2/dir3/dir4 + +# Filesystem looks like: +# +# . (ino 256) +# | +# |--- dir1/ (ino 257) +# | |--- dir2/ (ino 258) +# | |--- dir3/ (ino 259) +# | |--- file1 (ino 261) +# | |--- dir4/ (ino 262) +# | +# |--- dir5/ (ino 260) +# +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ + $SCRATCH_MNT/mysnap1 > /dev/null + +$BTRFS_UTIL_PROG send -f $send_files_dir/1.snap \ + $SCRATCH_MNT/mysnap1 2>&1 1>/dev/null | _filter_scratch + +mkdir $SCRATCH_MNT/dir1/dir6 +mv $SCRATCH_MNT/dir5 $SCRATCH_MNT/dir1/dir2/dir3/dir4/dir55 +ln $SCRATCH_MNT/dir1/dir2/dir3/file1 $SCRATCH_MNT/dir1/dir2/dir3/dir4/file11 +mv $SCRATCH_MNT/dir1/dir2/dir3/dir4 $SCRATCH_MNT/dir1/dir6/dir44 +mv $SCRATCH_MNT/dir1/dir2/dir3/file1 $SCRATCH_MNT/dir1/dir2/dir3/dir4 + +# Filesystem now looks like: +# +# . (ino 256) +# | +# |--- dir1/ (ino 257) +#|--- dir2/ (ino 258) +#| |--- dir3/ (ino 259) +#||--- dir4 (ino 261) +#| +#|--- dir6/ (ino 263) +# |--- dir44/ (ino 262) +# |--- file11 (ino 261) +# |--- dir55/ (ino 260) +# +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ +$SCRATCH_MNT/mysnap2 > /dev/null + +$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 -f $send_files_dir/2.snap \ +$SCRATCH_MNT/mysnap2 2>&1 1>/dev/null | _filter_scratch + +$FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1 +$FSSUM_PROG -A -f -w $send_files_dir/2.fssum \ + -x
Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting
On Mon, Jun 26, 2017 at 01:58:32PM -0400, jlay...@redhat.com wrote: > On Mon, 2017-06-26 at 08:22 -0700, Darrick J. Wong wrote: > > On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote: > > > Just check and advance the data errseq_t in struct file before > > > before returning from fsync on normal files. Internal filemap_* > > > callers are left as-is. > > > > > > Signed-off-by: Jeff Layton> > > --- > > > fs/xfs/xfs_file.c | 15 +++ > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > > index 5fb5a0958a14..bc3b1575e8db 100644 > > > --- a/fs/xfs/xfs_file.c > > > +++ b/fs/xfs/xfs_file.c > > > @@ -134,7 +134,7 @@ xfs_file_fsync( > > > struct inode*inode = file->f_mapping- > > > >host; > > > struct xfs_inode*ip = XFS_I(inode); > > > struct xfs_mount*mp = ip->i_mount; > > > - int error = 0; > > > + int error = 0, err2; > > > int log_flushed = 0; > > > xfs_lsn_t lsn = 0; > > > > > > @@ -142,10 +142,12 @@ xfs_file_fsync( > > > > > > error = filemap_write_and_wait_range(inode->i_mapping, > > > start, end); > > > if (error) > > > - return error; > > > + goto out; > > > > > > - if (XFS_FORCED_SHUTDOWN(mp)) > > > - return -EIO; > > > + if (XFS_FORCED_SHUTDOWN(mp)) { > > > + error = -EIO; > > > + goto out; > > > + } > > > > > > xfs_iflags_clear(ip, XFS_ITRUNCATED); > > > > > > @@ -197,6 +199,11 @@ xfs_file_fsync( > > > mp->m_logdev_targp == mp->m_ddev_targp) > > > xfs_blkdev_issue_flush(mp->m_ddev_targp); > > > > > > +out: > > > + err2 = filemap_report_wb_err(file); > > > > Could we have a comment here to remind anyone reading the code a year > > from now that filemap_report_wb_err has side effects? Pre-coffee me > > was > > wondering why we'd bother calling filemap_report_wb_err in the > > XFS_FORCED_SHUTDOWN case, then remembered that it touches data > > structures. > > > > The first sentence of the commit message (really, the word 'advance') > > added as a comment was adequate to remind me of the side effects. > > > > Once that's added, > > Reviewed-by: Darrick J. Wong > > > > --D > > > > Yeah, definitely. I'm working on a respin of the series now to > incorporate HCH's suggestion too. I'll add that in as well. > > Maybe I should rename that function to file_check_and_advance_wb_err() > ? It would be good to make it clear that it does advance the errseq_t > cursor. Seems like a good idea. --D > > > > + if (!error) > > > + error = err2; > > > + > > > return error; > > > } > > > > > > -- > > > 2.13.0 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux- > > > xfs" in > > > the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio
On Mon, Jun 26, 2017 at 08:09:57PM +0800, Ming Lei wrote: > Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio) > mentioned that for dio the submitted bio may be fast cloned, we > can't access the bvec table directly for a cloned bio, so use > bio_get_first_bvec() to retrieve the 1st bvec. > Looks good to me. Reviewed-by: Liu Bo-liubo > Cc: Chris Mason > Cc: Josef Bacik > Cc: David Sterba > Cc: linux-btrfs@vger.kernel.org > Cc: Liu Bo > Signed-off-by: Ming Lei > --- > fs/btrfs/inode.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 06dea7c89bbd..4ab02b34f029 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct > bio *failed_bio, > int read_mode = 0; > int segs; > int ret; > + struct bio_vec bvec; > > BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); > > @@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct > bio *failed_bio, > } > > segs = bio_segments(failed_bio); > + bio_get_first_bvec(failed_bio, ); > if (segs > 1 || > - (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode))) > + (bvec.bv_len > btrfs_inode_sectorsize(inode))) > read_mode |= REQ_FAILFAST_DEV; > > isector = start - btrfs_io_bio(failed_bio)->logical; > -- > 2.9.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent
On Mon, Jun 26, 2017 at 04:09:53PM +0200, David Sterba wrote: > On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote: > > From: Liu BoAh, my From was broken again. > > > > %search_start is calculated in a wrong way, and if %ins is a cross-stripe > > one, it'll search the same block group forever. > > That's a bit terse description, so please check if my understanding is right: > search_start advances by at least one stripe len, but the math would be wrong > as using bg_offset would not move us to the next stripe. > bg_cache->key.objectid > is the full length so this will reach the next stripe and will not loop > forever. Yes, it's correct, the code's logic is like, now that the returned %ins is a cross-stripe one, it then calculates a BTRFS_STRIPE_LEN aligned one as the new %search_start and see if there is any free block matching %search_start. The current code is using a wrong offset, the offset really should be the start position of a block group. > > Do you happen to have a test for that? Unfortunately it's not a test with vanilla progs. I found this when mkfs.btrfs with a 12K nodesize, but now kernel has a power_of_2 limitation for nodesize and progs code is using a weird IS_ALIGNED() which has the same effect with power_of_2(), mkfs.btrfs -n 12K is not allowed. I changed IS_ALIGNED() to (blocksize % nodesize != 0) and got the above loop. > > > Signed-off-by: Liu Bo > > --- > > extent-tree.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/extent-tree.c b/extent-tree.c > > index b12ee29..5e09274 100644 > > --- a/extent-tree.c > > +++ b/extent-tree.c > > @@ -2614,8 +2614,9 @@ check_failed: > > goto no_bg_cache; > > bg_offset = ins->objectid - bg_cache->key.objectid; > > > > - search_start = round_up(bg_offset + num_bytes, > > - BTRFS_STRIPE_LEN) + bg_offset; > > + search_start = round_up( > > + bg_offset + num_bytes, BTRFS_STRIPE_LEN) + > > + bg_cache->key.object; > > extent-tree.c: In function ‘find_free_extent’: > extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named > ‘object’; did you mean ‘objectid’? > bg_cache->key.object; > ^ Ouch, that's right, it's %objectid. I'll send a updated one, thanks for the comments. -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting
On Mon, 2017-06-26 at 08:22 -0700, Darrick J. Wong wrote: > On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote: > > Just check and advance the data errseq_t in struct file before > > before returning from fsync on normal files. Internal filemap_* > > callers are left as-is. > > > > Signed-off-by: Jeff Layton> > --- > > fs/xfs/xfs_file.c | 15 +++ > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 5fb5a0958a14..bc3b1575e8db 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -134,7 +134,7 @@ xfs_file_fsync( > > struct inode*inode = file->f_mapping- > > >host; > > struct xfs_inode*ip = XFS_I(inode); > > struct xfs_mount*mp = ip->i_mount; > > - int error = 0; > > + int error = 0, err2; > > int log_flushed = 0; > > xfs_lsn_t lsn = 0; > > > > @@ -142,10 +142,12 @@ xfs_file_fsync( > > > > error = filemap_write_and_wait_range(inode->i_mapping, > > start, end); > > if (error) > > - return error; > > + goto out; > > > > - if (XFS_FORCED_SHUTDOWN(mp)) > > - return -EIO; > > + if (XFS_FORCED_SHUTDOWN(mp)) { > > + error = -EIO; > > + goto out; > > + } > > > > xfs_iflags_clear(ip, XFS_ITRUNCATED); > > > > @@ -197,6 +199,11 @@ xfs_file_fsync( > > mp->m_logdev_targp == mp->m_ddev_targp) > > xfs_blkdev_issue_flush(mp->m_ddev_targp); > > > > +out: > > + err2 = filemap_report_wb_err(file); > > Could we have a comment here to remind anyone reading the code a year > from now that filemap_report_wb_err has side effects? Pre-coffee me > was > wondering why we'd bother calling filemap_report_wb_err in the > XFS_FORCED_SHUTDOWN case, then remembered that it touches data > structures. > > The first sentence of the commit message (really, the word 'advance') > added as a comment was adequate to remind me of the side effects. > > Once that's added, > Reviewed-by: Darrick J. Wong > > --D > Yeah, definitely. I'm working on a respin of the series now to incorporate HCH's suggestion too. I'll add that in as well. Maybe I should rename that function to file_check_and_advance_wb_err() ? It would be good to make it clear that it does advance the errseq_t cursor. > > + if (!error) > > + error = err2; > > + > > return error; > > } > > > > -- > > 2.13.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > xfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux- > btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] btrfs: convert prelimary reference tracking to use rbtrees
On 6/20/17 12:06 PM, Edmund Nadolski wrote: > It's been known for a while that the use of multiple lists > that are periodically merged was an algorithmic problem within > btrfs. There are several workloads that don't complete in any > reasonable amount of time (e.g. btrfs/130) and others that cause > soft lockups. > > The solution is to use a pair of rbtrees that do insertion merging > for both indirect and direct refs, with the former converting > refs into the latter. The result is a btrfs/130 workload that > used to take several hours now takes about half of that. This > runtime still isn't acceptable and a future patch will address that > by moving the rbtrees higher in the stack so the lookups can be > shared across multiple calls to find_parent_nodes. > > Signed-off-by: Edmund Nadolski> Signed-off-by: Jeff Mahoney [...] > @@ -504,37 +665,22 @@ static int resolve_indirect_refs(struct btrfs_fs_info > *fs_info, > return ret; > } > > -static inline int ref_for_same_block(struct prelim_ref *ref1, > - struct prelim_ref *ref2) > -{ > - if (ref1->level != ref2->level) > - return 0; > - if (ref1->root_id != ref2->root_id) > - return 0; > - if (ref1->key_for_search.type != ref2->key_for_search.type) > - return 0; > - if (ref1->key_for_search.objectid != ref2->key_for_search.objectid) > - return 0; > - if (ref1->key_for_search.offset != ref2->key_for_search.offset) > - return 0; > - if (ref1->parent != ref2->parent) > - return 0; > - > - return 1; > -} > - > /* > * read tree blocks and add keys where required. > */ > static int add_missing_keys(struct btrfs_fs_info *fs_info, > - struct list_head *head) > + struct preftrees *preftrees) > { > struct prelim_ref *ref; > struct extent_buffer *eb; > + struct rb_node *node = rb_first(>indirect.root); > + > + while (node) { > + ref = rb_entry(node, struct prelim_ref, rbnode); > + node = rb_next(>rbnode); > + if (WARN(ref->parent, "BUG: direct ref found in indirect tree")) > + return -EINVAL; > > - list_for_each_entry(ref, head, list) { > - if (ref->parent) > - continue; > if (ref->key_for_search.type) > continue; > BUG_ON(!ref->wanted_disk_byte); Hi Ed - I missed this in earlier review, but this can't work. We're modifying the ref in a way that the comparator will care about -- so the node would move in the tree. It's not a fatal flaw and, in fact, leaves us an opening to fix a separate locking issue. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Btrfs progs pre-release 4.11.1-rc1
Hi, a pre-release has been tagged. A bugfix release. Changes: * image: restoring from multiple devices * dev stats: make --check option work * check: fix false alert with extent hole on a NO_HOLE filesystem * check: lowmem mode, fix false alert in case of mixed inline and compressed extent * convert: work with large filesystems (many TB) * docs updates * build: sync Android.mk with Makefile * tests: * new tests * fix 008 and 009, shell quotation mistake ETA for 4.11.1 is in +4 days (2017-06-30). Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: David Sterba (7): btrfs-progs: docs: update formatting of btrfs-rescue btrfs-progs: docs: update formatting of btrfs-property btrfs-progs: docs: fix sentence for no-dump file attribute btrfs-progs: docs: update note about device deletion btrfs-progs: build: sync recent makefile changes to android.mk btrfs-progs: update CHANGES for v4.11.1 Btrfs progs v4.11.1-rc1 Filipe Manana (2): btrfs-progs: Fix restoring image from multi devices fs into single device btrfs-progs: test for restoring multiple devices fs into a single device Hans van Kranenburg (1): btrfs-progs: send operates on ro snapshots only Kasijjuf (3): btrfs-progs: docs: Expand confusing abbreviation in documentation btrfs-progs: docs: Wrong section in ref to manpage btrfs-progs: docs: replace with Lakshmipathi.G (3): btrfs-progs: Fix 'btrfs device stats --check' cli option btrfs-progs: convert: widen int types in convert context btrfs-progs: convert: Add larger device support Lu Fengqi (1): btrfs-progs: lowmem check: Fix false alert about file extent interrupt Qu Wenruo (2): btrfs-progs: check: Fix false alert about EXTENT_DATA that shouldn't be a hole btrfs-progs: tests: Add test case to check file hole extents with NO_HOLES flag Tsutomu Itoh (1): btrfs-progs: tests: remove variable quotation from convert-tests -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Btrfs for 4.13, part 1
On 06/23/2017 11:16 AM, David Sterba wrote: Hi, this is the main batch for 4.13. There are some user visible changes, see below. The core updates improve error handling (mostly related to bios), with the usual incremental work on the GFP_NOFS (mis)use removal. All patches have been in for-next for an extensive amount of time. Thre will be followups but I want push the series (111 patches) forward. There are also some updates to adjacent subsystems (writeback and blocklayer), so I want to give some stable point for merging in the upcoming weeks. Thanks Dave, I ran this (along with the updates we added) through a long stress and the usual xfstests. For everyone else on the list, since I'm heading off to vacation until ~July 9th, Dave is sending this off to Linus once the merge window starts. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs: Add zstd support
Thanks for the clarification! I will fix the divisions. On 6/26/17, 5:12 AM, "David Sterba"wrote: On Sun, Jun 25, 2017 at 11:30:22PM +0200, Adam Borowski wrote: > On Mon, Jun 26, 2017 at 03:03:17AM +0800, kbuild test robot wrote: > > Hi Nick, > > > > url: https://github.com/0day-ci/linux/commits/Nick-Terrell/lib-Add-xxhash-module/20170625-214344 > > config: i386-allmodconfig (attached as .config) > > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > > reproduce: > > # save the attached .config to linux build tree > > make ARCH=i386 > > > > All errors (new ones prefixed by >>): > > > > >> ERROR: "__udivdi3" [lib/zstd/zstd_compress.ko] undefined! > >ERROR: "__udivdi3" [fs/ufs/ufs.ko] undefined! > > Just to save you time to figure it out: > for division when one or both arguments are longer than the architecture's > word, gcc uses helper functions that are included when compiling in a hosted > environment -- but not in freestanding. > > Thus, you want do_div() instead of /; do check widths and signedness of > arguments. No do_div please, div_u64 or div64_u64. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH 03/11] btrfs: Don't clear SGID when inheriting ACLs
On Thu, Jun 22, 2017 at 03:31:07PM +0200, Jan Kara wrote: > When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit > set, DIR1 is expected to have SGID bit set (and owning group equal to > the owning group of 'DIR0'). However when 'DIR0' also has some default > ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on > 'DIR1' to get cleared if user is not member of the owning group. > > Fix the problem by moving posix_acl_update_mode() out of > __btrfs_set_acl() into btrfs_set_acl(). That way the function will not be > called when inheriting ACLs which is what we want as it prevents SGID > bit clearing and the mode has been properly set by posix_acl_create() > anyway. > > Fixes: 073931017b49d9458aa351605b43a7e34598caef > CC: sta...@vger.kernel.org > CC: linux-btrfs@vger.kernel.org > CC: David Sterba> Signed-off-by: Jan Kara Added to btrfs patch queue, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Next btrfs development cycle open - 4.14
Hi, a friendly reminder of the timetable and what's expected at this phase. 4.11 - current 4.12 - upcoming, urgent regression fixes only 4.13 - development closed, pull request pending, fixes or regressions only 4.14 - development open, until 4.13-rc5 (https://btrfs.wiki.kernel.org/index.php/Developer%27s_FAQ#Development_schedule) Besides the the usual cleanups and fixes, you can now start sending any patches that could be more intrusive and would benefit from a longer period of testing, or development revisions. The base of the patches should be the last pull request, which is 'for-4.13-part1' in my k.org tree. Reviewed patches will be collected in a branch that's usually named 'misc-next' in my devel git repos and is part of the for-next at k.org git repo. k.org: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git devel1: http://repo.or.cz/linux-2.6/btrfs-unstable.git devel2: https://github.com/kdave/btrfs-devel d. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: Optimise layout of btrfs_block_group_cache
On 26.06.2017 17:42, Nikolay Borisov wrote: > With this patch applied pahole stats look like: > > /* size: 840, cachelines: 14, members: 40 */ > /* sum members: 833, holes: 1, sum holes: 7 */ > /* bit holes: 1, sum bit holes: 28 bits */ > /* last cacheline: 8 bytes */ > > No functional changes. > > Signed-off-by: Nikolay Borisov> --- > fs/btrfs/ctree.h | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index cdd3775e930b..bdd06bbeb9aa 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -586,6 +586,11 @@ struct btrfs_block_group_cache { > unsigned int iref:1; > unsigned int has_caching_ctl:1; > unsigned int removed:1; > + /* > + * Does the block group need to be added to the free space tree? > + * Protected by free_space_lock. > + */ > + unsigned int needs_free_space:1; Upon closer inspection of memory-barriers.txt I'm not confident in this change. This puts fields protected by different locks in the same bitfield which can lead to corrupted values. > > int disk_cache_state; > > @@ -608,6 +613,8 @@ struct btrfs_block_group_cache { > /* usage count */ > atomic_t count; > > + atomic_t trimming This one will likely eliminated 1 hole in the struct so I might end up sending v2 of this patch. > + > /* List of struct btrfs_free_clusters for this block group. >* Today it will only have one thing on it, but that may change >*/ > @@ -619,8 +626,6 @@ struct btrfs_block_group_cache { > /* For read-only block groups */ > struct list_head ro_list; > > - atomic_t trimming; > - > /* For dirty block groups */ > struct list_head dirty_list; > struct list_head io_list; > @@ -651,11 +656,6 @@ struct btrfs_block_group_cache { > /* Lock for free space tree operations. */ > struct mutex free_space_lock; > > - /* > - * Does the block group need to be added to the free space tree? > - * Protected by free_space_lock. > - */ > - int needs_free_space; > > /* Record locked full stripes for RAID5/6 block group */ > struct btrfs_full_stripe_locks_tree full_stripe_locks_root; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting
On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote: > Just check and advance the data errseq_t in struct file before > before returning from fsync on normal files. Internal filemap_* > callers are left as-is. > > Signed-off-by: Jeff Layton> --- > fs/xfs/xfs_file.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 5fb5a0958a14..bc3b1575e8db 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -134,7 +134,7 @@ xfs_file_fsync( > struct inode*inode = file->f_mapping->host; > struct xfs_inode*ip = XFS_I(inode); > struct xfs_mount*mp = ip->i_mount; > - int error = 0; > + int error = 0, err2; > int log_flushed = 0; > xfs_lsn_t lsn = 0; > > @@ -142,10 +142,12 @@ xfs_file_fsync( > > error = filemap_write_and_wait_range(inode->i_mapping, start, end); > if (error) > - return error; > + goto out; > > - if (XFS_FORCED_SHUTDOWN(mp)) > - return -EIO; > + if (XFS_FORCED_SHUTDOWN(mp)) { > + error = -EIO; > + goto out; > + } > > xfs_iflags_clear(ip, XFS_ITRUNCATED); > > @@ -197,6 +199,11 @@ xfs_file_fsync( > mp->m_logdev_targp == mp->m_ddev_targp) > xfs_blkdev_issue_flush(mp->m_ddev_targp); > > +out: > + err2 = filemap_report_wb_err(file); Could we have a comment here to remind anyone reading the code a year from now that filemap_report_wb_err has side effects? Pre-coffee me was wondering why we'd bother calling filemap_report_wb_err in the XFS_FORCED_SHUTDOWN case, then remembered that it touches data structures. The first sentence of the commit message (really, the word 'advance') added as a comment was adequate to remind me of the side effects. Once that's added, Reviewed-by: Darrick J. Wong --D > + if (!error) > + error = err2; > + > return error; > } > > -- > 2.13.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: lowmem check: Fix false alert about file extent interrupt
On Thu, Jun 22, 2017 at 04:12:56PM +0800, Lu Fengqi wrote: > As Qu mentioned in this thread > (https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression > can cause regular extent to co-exist with inlined extent. This coexistence > makes things confusing. Since it was permitted currently, so fix > btrfsck to prevent a bunch of error logs that will make user feel > panic. > > When check file extent, record the extent_end of regular extent to check > if there is a gap between the regular extents. Normally there is only one > inlined extent, so the extent_end of inlined extent is useless. However, > if regular extent can co-exist with inlined extent, the extent_end of > inlined extent also need to record. > > Reported-by: Marc MERLIN> Signed-off-by: Lu Fengqi Applied, thanks. Do you have a test for that? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: Optimise layout of btrfs_block_group_cache
With this patch applied pahole stats look like: /* size: 840, cachelines: 14, members: 40 */ /* sum members: 833, holes: 1, sum holes: 7 */ /* bit holes: 1, sum bit holes: 28 bits */ /* last cacheline: 8 bytes */ No functional changes. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cdd3775e930b..bdd06bbeb9aa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -586,6 +586,11 @@ struct btrfs_block_group_cache { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + /* +* Does the block group need to be added to the free space tree? +* Protected by free_space_lock. +*/ + unsigned int needs_free_space:1; int disk_cache_state; @@ -608,6 +613,8 @@ struct btrfs_block_group_cache { /* usage count */ atomic_t count; + atomic_t trimming; + /* List of struct btrfs_free_clusters for this block group. * Today it will only have one thing on it, but that may change */ @@ -619,8 +626,6 @@ struct btrfs_block_group_cache { /* For read-only block groups */ struct list_head ro_list; - atomic_t trimming; - /* For dirty block groups */ struct list_head dirty_list; struct list_head io_list; @@ -651,11 +656,6 @@ struct btrfs_block_group_cache { /* Lock for free space tree operations. */ struct mutex free_space_lock; - /* -* Does the block group need to be added to the free space tree? -* Protected by free_space_lock. -*/ - int needs_free_space; /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: remove unused sectorsize member
The sectorsize member of btrfs_block_group_cache is unused. So remove it, this reduces the number of holes in the struct. With patch: /* size: 856, cachelines: 14, members: 40 */ /* sum members: 837, holes: 4, sum holes: 19 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 24 bytes */ Without patch: /* size: 864, cachelines: 14, members: 41 */ /* sum members: 841, holes: 5, sum holes: 23 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 32 bytes */ Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 1 - fs/btrfs/extent-tree.c | 1 - 2 files changed, 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a75a23f9d68e..cdd3775e930b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -559,7 +559,6 @@ struct btrfs_block_group_cache { u64 bytes_super; u64 flags; u64 cache_generation; - u32 sectorsize; /* * If the free space extent count exceeds this number, convert the block diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a08a743a8e09..2a0d300c7d1a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9904,7 +9904,6 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, cache->key.offset = size; cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - cache->sectorsize = fs_info->sectorsize; cache->fs_info = fs_info; cache->full_stripe_len = btrfs_full_stripe_len(fs_info, _info->mapping_tree, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: Fix false alert about EXTENT_DATA shouldn't be hole
On Mon, Jun 19, 2017 at 01:26:20PM +0200, Henk Slager wrote: > On 16-06-17 03:43, Qu Wenruo wrote: > > Since incompat feature NO_HOLES still allow us to have explicit hole > > file extent, current check is too restrict and will cause false alert > > like: > > > > root 5 EXTENT_DATA[257, 0] shouldn't be hole > > > > Fix it by removing the restrict hole file extent check. > > > > Reported-by: Henk Slager> > Signed-off-by: Qu Wenruo > > --- > > cmds-check.c | 6 +- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/cmds-check.c b/cmds-check.c > > index c052f66e..7bd57677 100644 > > --- a/cmds-check.c > > +++ b/cmds-check.c > > @@ -4841,11 +4841,7 @@ static int check_file_extent(struct btrfs_root > > *root, struct btrfs_key *fkey, > > } > > > > /* Check EXTENT_DATA hole */ > > - if (no_holes && is_hole) { > > - err |= FILE_EXTENT_ERROR; > > - error("root %llu EXTENT_DATA[%llu %llu] shouldn't be hole", > > - root->objectid, fkey->objectid, fkey->offset); > > - } else if (!no_holes && *end != fkey->offset) { > > + if (!no_holes && *end != fkey->offset) { > > err |= FILE_EXTENT_ERROR; > > error("root %llu EXTENT_DATA[%llu %llu] interrupt", > > root->objectid, fkey->objectid, fkey->offset); > > > Thanks for the patch, I applied it on v4.11 btrfs-progs and re-ran the check: > # btrfs check -p --readonly /dev/mapper/smr > > on filesystem mentioned in: > https://www.spinics.net/lists/linux-btrfs/msg66374.html > > and now the "shouldn't be hole" errors don't show up anymore. > > Tested-by: Henk Slager Thank you both, patch applied. I might also release a 4.11.x release with this fix included. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking
On Sat, 2017-06-24 at 09:16 -0400, Jeff Layton wrote: > On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote: > > On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote: > > > In order to query for errors with errseq_t, you need a previously- > > > sampled point from which to check. When you call > > > filemap_write_and_wait_range though you don't have a struct file and so > > > no previously-sampled value. > > > > So can we simply introduce variants of them that take a struct file? > > That would be: > > > > a) less churn > > b) less code > > c) less chance to get data integrity wrong > > Yeah, I had that thought after I sent the reply to you earlier. > > The main reason I didn't do that before was that I had myself convinced > that we needed to do the check_and_advance as late as possible in the > fsync process, after the metadata had been written. > > Now that I think about it more, I think you're probably correct. As long > as we do the check and advance at some point after doing the > write_and_wait, we're fine here and shouldn't violate exactly once > semantics on the fsync return. So I have a file_write_and_wait_range now that should DTRT for this patch. The bigger question is -- what about more complex filesystems like ext4? There are a couple of cases where we can return -EIO or -EROFS on fsync before filemap_write_and_wait_range is ever called. Like this one for instance: if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb return -EIO; ...and the EXT4_MF_FS_ABORTED case. Are those conditions ever recoverable, such that a later fsync could succeed? IOW, could I do a remount or something such that the existing fds are left open and become usable again? If so, then we really ought to advance the errseq_t in the file when we catch those cases as well. If we have to do that, then it probably makes sense to leave the ext4 patch as-is. -- Jeff Layton-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: mkfs: Replace number with a macro
On Mon, Jun 26, 2017 at 06:18:29PM +0800, Gu Jinxiang wrote: > For code maintainability and scalability, > replace number with a macro of member blocks in btrfs_mkfs_config. > > Signed-off-by: Gu Jinxiang> --- > Changes since v1: > Missing a using place. And modify it. > > mkfs/common.c | 4 ++-- > mkfs/common.h | 5 - > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mkfs/common.c b/mkfs/common.c > index e4785c5..0d79650 100644 > --- a/mkfs/common.c > +++ b/mkfs/common.c > @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) > uuid_generate(chunk_tree_uuid); > > cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET; > - for (i = 1; i < 7; i++) { > + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) { I'm not sure this is the best way to make the code more readable. "NR" is the count of the roots and if it were used as " < NR" then it's clear that we're iterating over a given number of items, but here the count is also going to be used as an index to an array. While this is correct, it's still necessary to keep in mind that some +1 or <= is needed while dealing with the blocks. make_btrfs could use some heavy cleanup so we don't rely on the hardcoded constants, in a similar way to reference_root_table so we can use symbolic tree names. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent
On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote: > From: Liu Bo> > %search_start is calculated in a wrong way, and if %ins is a cross-stripe > one, it'll search the same block group forever. That's a bit terse description, so please check if my understanding is right: search_start advances by at least one stripe len, but the math would be wrong as using bg_offset would not move us to the next stripe. bg_cache->key.objectid is the full length so this will reach the next stripe and will not loop forever. Do you happen to have a test for that? > Signed-off-by: Liu Bo > --- > extent-tree.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/extent-tree.c b/extent-tree.c > index b12ee29..5e09274 100644 > --- a/extent-tree.c > +++ b/extent-tree.c > @@ -2614,8 +2614,9 @@ check_failed: > goto no_bg_cache; > bg_offset = ins->objectid - bg_cache->key.objectid; > > - search_start = round_up(bg_offset + num_bytes, > - BTRFS_STRIPE_LEN) + bg_offset; > + search_start = round_up( > + bg_offset + num_bytes, BTRFS_STRIPE_LEN) + > + bg_cache->key.object; extent-tree.c: In function ‘find_free_extent’: extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named ‘object’; did you mean ‘objectid’? bg_cache->key.object; ^ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting
On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote: > Just check and advance the data errseq_t in struct file before > before returning from fsync on normal files. Internal filemap_* > callers are left as-is. > Looks good. Reviewed-by: Carlos Maiolino> Signed-off-by: Jeff Layton > --- > fs/xfs/xfs_file.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 5fb5a0958a14..bc3b1575e8db 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -134,7 +134,7 @@ xfs_file_fsync( > struct inode*inode = file->f_mapping->host; > struct xfs_inode*ip = XFS_I(inode); > struct xfs_mount*mp = ip->i_mount; > - int error = 0; > + int error = 0, err2; > int log_flushed = 0; > xfs_lsn_t lsn = 0; > > @@ -142,10 +142,12 @@ xfs_file_fsync( > > error = filemap_write_and_wait_range(inode->i_mapping, start, end); > if (error) > - return error; > + goto out; > > - if (XFS_FORCED_SHUTDOWN(mp)) > - return -EIO; > + if (XFS_FORCED_SHUTDOWN(mp)) { > + error = -EIO; > + goto out; > + } > > xfs_iflags_clear(ip, XFS_ITRUNCATED); > > @@ -197,6 +199,11 @@ xfs_file_fsync( > mp->m_logdev_targp == mp->m_ddev_targp) > xfs_blkdev_issue_flush(mp->m_ddev_targp); > > +out: > + err2 = filemap_report_wb_err(file); > + if (!error) > + error = err2; > + > return error; > } > > -- > 2.13.0 > -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Btrfs for 4.13, part 1 (update 1)
On Fri, Jun 23, 2017 at 05:16:46PM +0200, David Sterba wrote: Two more patches added to the branch Chris Mason (1): btrfs: fix integer overflow in calc_reclaim_items_nr David Sterba (1): btrfs: scrub: fix target device intialization while setting up scrub context Updated branch and tag: The following changes since commit 41f1830f5a7af77cf5c86359aba3cbd706687e52: Linux 4.12-rc6 (2017-06-19 22:19:37 +0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.13-part1 for you to fetch changes up to 8399f53f0c7450ab050b1b0ffee4e2c1ddd2a3e0: btrfs: fix integer overflow in calc_reclaim_items_nr (2017-06-26 15:33:42 +0200) Previous: > > The following changes since commit 41f1830f5a7af77cf5c86359aba3cbd706687e52: > > Linux 4.12-rc6 (2017-06-19 22:19:37 +0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.13-part1 > > for you to fetch changes up to f3f000297be88b1b75fde5027d660a8d8a44de14: > > btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved > ranges (2017-06-21 20:56:14 +0200) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: scrub: fix target device intialization while setting up scrub context
The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a helper but wrongly sets up the target device. Incidentally there's a local variable with the same name as a parameter in the previous function, so this got caught during runtime as crash in test btrfs/027. Reported-by: Chris MasonSigned-off-by: David Sterba --- fs/btrfs/scrub.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 58a249cd5adc..738e784ba20d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -714,9 +714,9 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) mutex_init(>wr_lock); sctx->wr_curr_bio = NULL; if (is_dev_replace) { - WARN_ON(!dev->bdev); + WARN_ON(!fs_info->dev_replace.tgtdev); sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO; - sctx->wr_tgtdev = dev; + sctx->wr_tgtdev = fs_info->dev_replace.tgtdev; atomic_set(>flush_all_writes, 0); } -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs-progs: btrfs-convert: Add larger device support
On Sat, Jun 03, 2017 at 03:27:45PM +0530, Lakshmipathi.G wrote: > With larger file system (in this case its 22TB), ext2fs_open() returns > EXT2_ET_CANT_USE_LEGACY_BITMAPS error message with ext2fs_read_block_bitmap(). > > To overcome this issue, (a) we need pass EXT2_FLAG_64BITS flag with > ext2fs_open. > (b) use 64-bit functions like ext2fs_get_block_bitmap_range2, > ext2fs_inode_data_blocks2,ext2fs_read_ext_attr2. (c) use 64bit types with > btrfs_convert_context fields. > > bug: https://bugzilla.kernel.org/show_bug.cgi?id=194795 > Signed-off-by: Lakshmipathi.GApplied, thanks. > --- a/convert/common.h > +++ b/convert/common.h > @@ -30,10 +30,10 @@ struct btrfs_mkfs_config; > > struct btrfs_convert_context { > u32 blocksize; > - u32 first_data_block; > - u32 block_count; > - u32 inodes_count; > - u32 free_inodes_count; > + u64 first_data_block; > + u64 block_count; > + u64 inodes_count; > + u64 free_inodes_count; I've split this change from the patch as it does not logically belong to the same patch, altough the change is simple. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Fix 'btrfs device stats --check' cli option
On Thu, Jun 22, 2017 at 01:27:53PM +0530, Lakshmipathi.G wrote: > Bug 194961 - btrfs device stats --check does not work > https://bugzilla.kernel.org/show_bug.cgi?id=194961 > > Reported-by: Tomas Thiemel> Signed-off-by: Lakshmipathi.G Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs: Add zstd support
On Sun, Jun 25, 2017 at 11:30:22PM +0200, Adam Borowski wrote: > On Mon, Jun 26, 2017 at 03:03:17AM +0800, kbuild test robot wrote: > > Hi Nick, > > > > url: > > https://github.com/0day-ci/linux/commits/Nick-Terrell/lib-Add-xxhash-module/20170625-214344 > > config: i386-allmodconfig (attached as .config) > > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > > reproduce: > > # save the attached .config to linux build tree > > make ARCH=i386 > > > > All errors (new ones prefixed by >>): > > > > >> ERROR: "__udivdi3" [lib/zstd/zstd_compress.ko] undefined! > >ERROR: "__udivdi3" [fs/ufs/ufs.ko] undefined! > > Just to save you time to figure it out: > for division when one or both arguments are longer than the architecture's > word, gcc uses helper functions that are included when compiling in a hosted > environment -- but not in freestanding. > > Thus, you want do_div() instead of /; do check widths and signedness of > arguments. No do_div please, div_u64 or div64_u64. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 15/51] btrfs: comment on direct access bvec table
Cc: Chris MasonCc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Signed-off-by: Ming Lei --- fs/btrfs/compression.c | 4 fs/btrfs/inode.c | 12 2 files changed, 16 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 2c0b7b57fcd5..5972f74354ca 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -541,6 +541,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* we need the actual starting offset of this extent in the file */ read_lock(_tree->lock); + /* +* It is still safe to retrieve the 1st page of the bio +* in this way after supporting multipage bvec. +*/ em = lookup_extent_mapping(em_tree, page_offset(bio->bi_io_vec->bv_page), PAGE_SIZE); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4ab02b34f029..7e725d84917b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8055,6 +8055,12 @@ static void btrfs_retry_endio_nocsum(struct bio *bio) if (bio->bi_status) goto end; + /* +* WARNING: +* +* With multipage bvec, the following way of direct access to +* bvec table is only safe if the bio includes single page. +*/ ASSERT(bio->bi_vcnt == 1); io_tree = _I(inode)->io_tree; failure_tree = _I(inode)->io_failure_tree; @@ -8146,6 +8152,12 @@ static void btrfs_retry_endio(struct bio *bio) uptodate = 1; + /* +* WARNING: +* +* With multipage bvec, the following way of direct access to +* bvec table is only safe if the bio includes single page. +*/ ASSERT(bio->bi_vcnt == 1); ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(done->inode)); -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio
Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio) mentioned that for dio the submitted bio may be fast cloned, we can't access the bvec table directly for a cloned bio, so use bio_get_first_bvec() to retrieve the 1st bvec. Cc: Chris MasonCc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Cc: Liu Bo Signed-off-by: Ming Lei --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 06dea7c89bbd..4ab02b34f029 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, int read_mode = 0; int segs; int ret; + struct bio_vec bvec; BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, } segs = bio_segments(failed_bio); + bio_get_first_bvec(failed_bio, ); if (segs > 1 || - (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode))) + (bvec.bv_len > btrfs_inode_sectorsize(inode))) read_mode |= REQ_FAILFAST_DEV; isector = start - btrfs_io_bio(failed_bio)->logical; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 13/51] btrfs: avoid access to .bi_vcnt directly
BTRFS uses bio->bi_vcnt to figure out page numbers, this way becomes not correct once we start to enable multipage bvec. So use bio_for_each_segment_all() to do that instead. Cc: Chris MasonCc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Signed-off-by: Ming Lei --- fs/btrfs/extent_io.c | 21 + fs/btrfs/extent_io.h | 2 +- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0863164d97d2..5b453cada1ea 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2258,7 +2258,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, return 0; } -int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, +int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, struct io_failure_record *failrec, int failed_mirror) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -2282,7 +2282,7 @@ int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, * a) deliver good data to the caller * b) correct the bad sectors on disk */ - if (failed_bio->bi_vcnt > 1) { + if (failed_bio_pages > 1) { /* * to fulfill b), we need to know the exact failing sectors, as * we don't want to rewrite any more than the failed ones. thus, @@ -2355,6 +2355,17 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, return bio; } +static unsigned int get_bio_pages(struct bio *bio) +{ + unsigned i; + struct bio_vec *bv; + + bio_for_each_segment_all(bv, bio, i) + ; + + return i; +} + /* * this is a generic handler for readpage errors (default * readpage_io_failed_hook). if other copies exist, read those and write back @@ -2375,6 +2386,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, int read_mode = 0; blk_status_t status; int ret; + unsigned failed_bio_pages = get_bio_pages(failed_bio); BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -2382,13 +2394,14 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, if (ret) return ret; - ret = btrfs_check_repairable(inode, failed_bio, failrec, failed_mirror); + ret = btrfs_check_repairable(inode, failed_bio_pages, failrec, +failed_mirror); if (!ret) { free_io_failure(failure_tree, tree, failrec); return -EIO; } - if (failed_bio->bi_vcnt > 1) + if (failed_bio_pages > 1) read_mode |= REQ_FAILFAST_DEV; phy_offset >>= inode->i_sb->s_blocksize_bits; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d4942d94a16b..90681d1f0786 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -539,7 +539,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end); int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, struct io_failure_record **failrec_ret); -int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio, +int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, struct io_failure_record *failrec, int fail_mirror); struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, struct io_failure_record *failrec, -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 32/51] btrfs: use bvec_get_last_page to get bio's last page
Preparing for supporting multipage bvec. Cc: Chris MasonCc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Signed-off-by: Ming Lei --- fs/btrfs/compression.c | 5 - fs/btrfs/extent_io.c | 8 ++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 5972f74354ca..fdab5b821aa8 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -391,8 +391,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, static u64 bio_end_offset(struct bio *bio) { struct bio_vec *last = >bi_io_vec[bio->bi_vcnt - 1]; + struct bio_vec bv; - return page_offset(last->bv_page) + last->bv_len + last->bv_offset; + bvec_get_last_page(last, ); + + return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset; } static noinline int add_ra_bio_pages(struct inode *inode, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5b453cada1ea..7cc6c8a52e49 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2741,11 +2741,15 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num, { blk_status_t ret = 0; struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; - struct page *page = bvec->bv_page; struct extent_io_tree *tree = bio->bi_private; + struct bio_vec bv; + struct page *page; u64 start; - start = page_offset(page) + bvec->bv_offset; + bvec_get_last_page(bvec, ); + page = bv.bv_page; + + start = page_offset(page) + bv.bv_offset; bio->bi_private = NULL; bio_get(bio); -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 48/51] fs/btrfs: convert to bio_for_each_segment_all_sp()
Cc: Chris MasonCc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Signed-off-by: Ming Lei --- fs/btrfs/compression.c | 3 ++- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/extent_io.c | 12 fs/btrfs/inode.c | 6 -- fs/btrfs/raid56.c | 6 -- 5 files changed, 20 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index fdab5b821aa8..9d1693ecf468 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -147,12 +147,13 @@ static void end_compressed_bio_read(struct bio *bio) } else { int i; struct bio_vec *bvec; + struct bvec_iter_all bia; /* * we have verified the checksum already, set page * checked so the end_io handlers know about it */ - bio_for_each_segment_all(bvec, cb->orig_bio, i) + bio_for_each_segment_all_sp(bvec, cb->orig_bio, i, bia) SetPageChecked(bvec->bv_page); bio_endio(cb->orig_bio); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f4f54d13db6d..e7efbaa3566c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -963,8 +963,9 @@ static blk_status_t btree_csum_one_bio(struct bio *bio) struct bio_vec *bvec; struct btrfs_root *root; int i, ret = 0; + struct bvec_iter_all bia; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all_sp(bvec, bio, i, bia) { root = BTRFS_I(bvec->bv_page->mapping->host)->root; ret = csum_dirty_buffer(root->fs_info, bvec->bv_page); if (ret) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7cc6c8a52e49..8e51452894ba 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2359,8 +2359,9 @@ static unsigned int get_bio_pages(struct bio *bio) { unsigned i; struct bio_vec *bv; + struct bvec_iter_all bia; - bio_for_each_segment_all(bv, bio, i) + bio_for_each_segment_all_sp(bv, bio, i, bia) ; return i; @@ -2468,8 +2469,9 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; int i; + struct bvec_iter_all bia; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all_sp(bvec, bio, i, bia) { struct page *page = bvec->bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -2538,8 +2540,9 @@ static void end_bio_extent_readpage(struct bio *bio) int mirror; int ret; int i; + struct bvec_iter_all bia; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all_sp(bvec, bio, i, bia) { struct page *page = bvec->bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -3695,8 +3698,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio) struct bio_vec *bvec; struct extent_buffer *eb; int i, done; + struct bvec_iter_all bia; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all_sp(bvec, bio, i, bia) { struct page *page = bvec->bv_page; eb = (struct extent_buffer *)page->private; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7e725d84917b..61cc6d899ae5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8051,6 +8051,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio) struct bio_vec *bvec; struct extent_io_tree *io_tree, *failure_tree; int i; + struct bvec_iter_all bia; if (bio->bi_status) goto end; @@ -8067,7 +8068,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio) ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(inode)); done->uptodate = 1; - bio_for_each_segment_all(bvec, bio, i) + bio_for_each_segment_all_sp(bvec, bio, i, bia) clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree, io_tree, done->start, bvec->bv_page, btrfs_ino(BTRFS_I(inode)), 0); @@ -8146,6 +8147,7 @@ static void btrfs_retry_endio(struct bio *bio) int uptodate; int ret; int i; + struct bvec_iter_all bia; if (bio->bi_status) goto end; @@ -8164,7 +8166,7 @@ static void btrfs_retry_endio(struct bio *bio) io_tree = _I(inode)->io_tree; failure_tree = _I(inode)->io_failure_tree; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all_sp(bvec, bio, i, bia) { ret = __readpage_endio_check(inode, io_bio, i,
Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?
On 2017年06月24日 10:34, Marc MERLIN wrote: On Fri, Jun 23, 2017 at 09:17:50AM -0700, Marc MERLIN wrote: Thanks for looking at this. I have applied your patch and I'm still re-running check in lowmem. It takes about 24H so I'll post the full results when it's done. Ok, here is the output of the check with btrfs-progs freshly synced from git, including Lu's just added patch. Obviously while I'm happy to give further debug info on why my filesystem is in that state and while check --repair sees nothing to repair, suggestions on how to clean those warnings up, unless they are not going to affect filesystem operation, would be greatly appreciated :) Thanks, Marc Thanks for the updated information. I'm sorry that the false alert make you feel nervous. ERROR: root 3862 EXTENT_DATA[18170706 4096] interrupt ERROR: root 3862 EXTENT_DATA[18170706 16384] interrupt ERROR: root 3862 EXTENT_DATA[18170706 20480] interrupt ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: errors found in fs roots However, this looks like another problem. Could you dump this file tree by the following command? # btrfs-debug-tree -t 3862 | grep -C 10 18170706 -- Thanks, Lu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/4] btrfs-progs: test: Add test image for lowmem mode referencer count mismatch false alert
Add a image which can reproduce the extent item referencer count mismatch false alert for lowmem mode. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- .../ref_count_mismatch_false_alert.img | Bin 0 -> 4096 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img diff --git a/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img b/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img new file mode 100644 index ..85110a813b5d00cb35d23babc70d57510cae19b0 GIT binary patch literal 4096 zcmeH}c|6oxAIE>Q7>1#-&$wtTF}mEwgG?nemXMMqOIb3S>}eQX#*#aBGA>1k8`yK>to-!TQWGJDmK5OLv(4g?)FJ-uijX$PT~x!ht)C zY5P1L4Blq?7v2ronfPws75J{e|40Gapo2(xdkdYH52nfE^?_C$u X-3`wy=*GP3mJul9nzPWam{Dy8Jm@HHSh;$0D%?#(CLV)}m z4=I4Q9U&_*-d#g>Vt=XgLnU`1QDX$?18gfj8HGr{0^h5Eif=UgaWL?esD!xAzoX zSPYMeybi82L>KA))qZ@PiatGF(7mnZRyApaxavmP17%d2>L@Te*1{-nwKu1W&2h)^EItwUC=(6YpcF!`TVtS0q ;I(2VAvqY@;m@EcAW6pd@gCaAJWrm=L DD!yE;pNYVL%W@MuW`_Z;`{%KeXc z6yV)hwUaVf1fT=6WYY$FN?k9wupm%KCipH*$G#z5H%XvAY#nOQCTEn4S1`={@Gxr= zt~K2#@Pm%O&0L?%-m|gq$ XuO~V6@K7~nFvdMFs;1g@2c>FiCFkFwOPj zJ@{D_aG peL|`wvq=c 0(z<`nlQb)97j(gm&49Ve)kpX| z<;B$0X?oL^)zQAzKjaU{M{UWA^-ua~yn7ruy@aAhosds;P119BS?So?C%gCglEl8$ z!q`l|sGp3QukgjlxR6qk7a0or0O>(q(`Mo3{wGxwNaFIDpC3^<@r`gzm*=R z(~pa6-#RgJ49%V|*lxGLiBV!o%B4?x1D`dX3BolS52|cfnP<$dmrPzBi`{wY&3LKa z^*j)&6)GvZYgkKjutQN{?yTvtlK~nv(x3YF=`p%HO0B^Mu2$SrFW>L|YDR+WSau}8 zGLxS(4p|G=KszkW2)*cgKmTUKZ|M)8_OesX>$?2jYiGPp=H$$4dtF)N)MQ;+U13 z`)AJ5wzSW=pI@2Vf2Pllk4;|F*k0jSy-7Gq`O|pI`P!Oy(LcCB!h09GCJS$6atEv* zH@lZkR9r7zb2jVUR4(!g6YmWL@* r;bEA4Np) V zpnjftsG)f^C+HBxJwT~T^|!1&$(M4_6-Z3IBDMBkp472TXb=iM^wPP8G#nv zC6qWNQ*1w+^Y~ZT^ et>M?!T0nVBQK+{Krai+mmboyL+aKu%`-FD=Vc6 zte)<=7aT3k0(l$=dFZ!PfHy4!dSx2@inv~XylqQqdaT@+G9X;+f_43L!#JKCUAHh8
[PATCH v3 3/4] btrfs-progs: test: Add test image for lowmem mode file extent interrupt
Add a image that the inlined extent coexist with the regular extent. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- .../020-extent-ref-cases/inline_regular_coexist.img | Bin 0 -> 4096 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img diff --git a/tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img b/tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img new file mode 100644 index ..cf15cc14539f8759d18457d66b1f604244375b73 GIT binary patch literal 4096 zcmeH|c{tSF9>6C`42^1Zkx@}1WS!6q##qwGGL|M~r)(kHSYPWHC2J^pTa+a+ zgX}{^WlR{mcQE#SnHTr|)xG!s``pL(InOzt^F7b=`r3)>B6fuyg{=A{1SgGNAfjf+gf>x5^89>!q~RF!x}RwLn~yX +TG(=oZAg)P-G(R|4qS8iYoH+gt!ix{l(>=(QMHW^+WMrqIS$FhqR^ zfbkD7l>uHOiSZrDu;z(jG5G?>y+WZqmCSsGS2W;$)2awzS)>fnx!?YUh!N+0a7XrS zxg*MZ;siAC>#x>OsgMX|Um$u?-| za?RDRdoY8~Rkm+476QWsekcwa*#s4s)sp%t!*Z5b(XQF_(!P8NFyY4tb?NgA`C0kM zi|n1{oEo@qv4G(Hl0O)tGx^u8MG(S<3Q=(3U{#r*1}Gno{#|3=v;U {K2hnUj!k9qklkr}MG2B_!>} zhLRkiu|j^F#6pdhFOp|y{==K| h=tKPBLIM>p(!>SS1N(xG1DN}grn*^m`)$Y~R*_WFs4{dsFJYQWI-zIc> zz&30AJ_R15-0IjY;%jaqFK*H*w&(qKdZ$-9ua%PfXWu78(0$hF#i#1cvb>BUwIrl} z+Z!!3G2xvyHIgzYhgs+v+X?xR{ff#Qy+ZFkKfYHgAAt2}zNUF!an=%#UOn66JckuO
[PATCH v3 1/4] btrfs-progs: lowmem check: Fix false alert about file extent interrupt
As Qu mentioned in this thread (https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression can cause regular extent to co-exist with inlined extent. This coexistence makes things confusing. Since it was permitted currently, so fix btrfsck to prevent a bunch of error logs that will make user feel panic. When check file extent, record the extent_end of regular extent to check if there is a gap between the regular extents. Normally there is only one inlined extent, so the extent_end of inlined extent is useless. However, if regular extent can co-exist with inlined extent, the extent_end of inlined extent also need to record. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- Changlog: v2: Just fix reported-by v3: Output verbose information when file extent interrupt cmds-check.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index c052f66e..70d2b7f2 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4782,6 +4782,7 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, extent_num_bytes, item_inline_len); err |= FILE_EXTENT_ERROR; } + *end += extent_num_bytes; *size += extent_num_bytes; return err; } @@ -4847,8 +4848,8 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, root->objectid, fkey->objectid, fkey->offset); } else if (!no_holes && *end != fkey->offset) { err |= FILE_EXTENT_ERROR; - error("root %llu EXTENT_DATA[%llu %llu] interrupt", - root->objectid, fkey->objectid, fkey->offset); + error("root %llu EXTENT_DATA[%llu %llu] interrupt, should start at %llu", + root->objectid, fkey->objectid, fkey->offset, *end); } *end += extent_num_bytes; -- 2.13.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/4] btrfs-progs: lowmem check: Fix false alert about referencer count mismatch
The normal back reference counting doesn't care about the extent referred by the extent data in the shared leaf. The check_extent_data_backref function need to skip the leaf that owner mismatch with the root_id. Reported-by: Marc MERLINSigned-off-by: Lu Fengqi --- cmds-check.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 70d2b7f2..f42968cd 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -10692,7 +10692,8 @@ static int check_extent_data_backref(struct btrfs_fs_info *fs_info, leaf = path.nodes[0]; slot = path.slots[0]; - if (slot >= btrfs_header_nritems(leaf)) + if (slot >= btrfs_header_nritems(leaf) || + btrfs_header_owner(leaf) != root_id) goto next; btrfs_item_key_to_cpu(leaf, , slot); if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY) -- 2.13.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs-progs: mkfs: Replace number with a macro
For code maintainability and scalability, replace number with a macro of member blocks in btrfs_mkfs_config. Signed-off-by: Gu Jinxiang--- Changes since v1: Missing a using place. And modify it. mkfs/common.c | 4 ++-- mkfs/common.h | 5 - 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/mkfs/common.c b/mkfs/common.c index e4785c5..0d79650 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) uuid_generate(chunk_tree_uuid); cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET; - for (i = 1; i < 7; i++) { + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) { cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + cfg->nodesize * i; } @@ -210,7 +210,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) cfg->nodesize - sizeof(struct btrfs_header)); nritems = 0; itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize); - for (i = 1; i < 7; i++) { + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) { item_size = sizeof(struct btrfs_extent_item); if (!skinny_metadata) item_size += sizeof(struct btrfs_tree_block_info); diff --git a/mkfs/common.h b/mkfs/common.h index 666a75b..e23e79b 100644 --- a/mkfs/common.h +++ b/mkfs/common.h @@ -28,6 +28,9 @@ #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */ +#define BTRFS_MKFS_ROOTS_NR 6 + struct btrfs_mkfs_config { /* Label of the new filesystem */ const char *label; @@ -43,7 +46,7 @@ struct btrfs_mkfs_config { /* Output fields, set during creation */ /* Logical addresses of superblock [0] and other tree roots */ - u64 blocks[8]; + u64 blocks[BTRFS_MKFS_ROOTS_NR + 1]; char fs_uuid[BTRFS_UUID_UNPARSED_SIZE]; char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE]; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: mkfs: Replace number with a macro
On 2017/06/26 17:23, Gu Jinxiang wrote: > For code maintainability and scalability, > replace number with a macro of member blocks in btrfs_mkfs_config. > > Signed-off-by: Gu Jinxiang> --- > mkfs/common.c | 2 +- > mkfs/common.h | 5 - > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mkfs/common.c b/mkfs/common.c > index e4785c5..420671b 100644 > --- a/mkfs/common.c > +++ b/mkfs/common.c > @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) > uuid_generate(chunk_tree_uuid); > > cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET; > - for (i = 1; i < 7; i++) { > + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) { If you change 7 to BTRFS_MKFS_ROOTS_NR, you also need to change the following code. 213 for (i = 1; i < 7; i++) { Thanks, Tsutomu > cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + > cfg->nodesize * i; > } > diff --git a/mkfs/common.h b/mkfs/common.h > index 666a75b..e23e79b 100644 > --- a/mkfs/common.h > +++ b/mkfs/common.h > @@ -28,6 +28,9 @@ > #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M > #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G > > +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */ > +#define BTRFS_MKFS_ROOTS_NR 6 > + > struct btrfs_mkfs_config { > /* Label of the new filesystem */ > const char *label; > @@ -43,7 +46,7 @@ struct btrfs_mkfs_config { > /* Output fields, set during creation */ > > /* Logical addresses of superblock [0] and other tree roots */ > - u64 blocks[8]; > + u64 blocks[BTRFS_MKFS_ROOTS_NR + 1]; > char fs_uuid[BTRFS_UUID_UNPARSED_SIZE]; > char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE]; > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: mkfs: Replace number with a macro
For code maintainability and scalability, replace number with a macro of member blocks in btrfs_mkfs_config. Signed-off-by: Gu Jinxiang--- mkfs/common.c | 2 +- mkfs/common.h | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/mkfs/common.c b/mkfs/common.c index e4785c5..420671b 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) uuid_generate(chunk_tree_uuid); cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET; - for (i = 1; i < 7; i++) { + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) { cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + cfg->nodesize * i; } diff --git a/mkfs/common.h b/mkfs/common.h index 666a75b..e23e79b 100644 --- a/mkfs/common.h +++ b/mkfs/common.h @@ -28,6 +28,9 @@ #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */ +#define BTRFS_MKFS_ROOTS_NR 6 + struct btrfs_mkfs_config { /* Label of the new filesystem */ const char *label; @@ -43,7 +46,7 @@ struct btrfs_mkfs_config { /* Output fields, set during creation */ /* Logical addresses of superblock [0] and other tree roots */ - u64 blocks[8]; + u64 blocks[BTRFS_MKFS_ROOTS_NR + 1]; char fs_uuid[BTRFS_UUID_UNPARSED_SIZE]; char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE]; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 05/22] jbd2: don't clear and reset errors after waiting on writeback
On Fri, Jun 16, 2017 at 03:34:10PM -0400, Jeff Layton wrote: > Resetting this flag is almost certainly racy, and will be problematic > with some coming changes. > > Make filemap_fdatawait_keep_errors return int, but not clear the flag(s). > Have jbd2 call it instead of filemap_fdatawait and don't attempt to > re-set the error flag if it fails. > > Signed-off-by: Jeff Layton> --- > fs/jbd2/commit.c | 15 +++ > include/linux/fs.h | 2 +- > mm/filemap.c | 16 ++-- > 3 files changed, 18 insertions(+), 15 deletions(-) > I'm not too experienced with jbd2 internals, but this patch is clear enough: Reviewed-by: Carlos Maiolino -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 04/22] buffer: set errors in mapping at the time that the error occurs
On Fri, Jun 16, 2017 at 03:34:09PM -0400, Jeff Layton wrote: > I noticed on xfs that I could still sometimes get back an error on fsync > on a fd that was opened after the error condition had been cleared. > > The problem is that the buffer code sets the write_io_error flag and > then later checks that flag to set the error in the mapping. That flag > perisists for quite a while however. If the file is later opened with > O_TRUNC, the buffers will then be invalidated and the mapping's error > set such that a subsequent fsync will return error. I think this is > incorrect, as there was no writeback between the open and fsync. > > Add a new mark_buffer_write_io_error operation that sets the flag and > the error in the mapping at the same time. Replace all calls to > set_buffer_write_io_error with mark_buffer_write_io_error, and remove > the places that check this flag in order to set the error in the > mapping. > > This sets the error in the mapping earlier, at the time that it's first > detected. > > Signed-off-by: Jeff Layton> Reviewed-by: Jan Kara > --- > fs/buffer.c | 20 +--- > fs/gfs2/lops.c | 2 +- > include/linux/buffer_head.h | 1 + > 3 files changed, 15 insertions(+), 8 deletions(-) > Reviewed-by: Carlos Maiolino > diff --git a/fs/buffer.c b/fs/buffer.c > index 7b4f4bfde91e..4d5d03b42e11 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -178,7 +178,7 @@ void end_buffer_write_sync(struct buffer_head *bh, int > uptodate) > set_buffer_uptodate(bh); > } else { > buffer_io_error(bh, ", lost sync page write"); > - set_buffer_write_io_error(bh); > + mark_buffer_write_io_error(bh); > clear_buffer_uptodate(bh); > } > unlock_buffer(bh); > @@ -352,8 +352,7 @@ void end_buffer_async_write(struct buffer_head *bh, int > uptodate) > set_buffer_uptodate(bh); > } else { > buffer_io_error(bh, ", lost async page write"); > - mapping_set_error(page->mapping, -EIO); > - set_buffer_write_io_error(bh); > + mark_buffer_write_io_error(bh); > clear_buffer_uptodate(bh); > SetPageError(page); > } > @@ -481,8 +480,6 @@ static void __remove_assoc_queue(struct buffer_head *bh) > { > list_del_init(>b_assoc_buffers); > WARN_ON(!bh->b_assoc_map); > - if (buffer_write_io_error(bh)) > - mapping_set_error(bh->b_assoc_map, -EIO); > bh->b_assoc_map = NULL; > } > > @@ -1181,6 +1178,17 @@ void mark_buffer_dirty(struct buffer_head *bh) > } > EXPORT_SYMBOL(mark_buffer_dirty); > > +void mark_buffer_write_io_error(struct buffer_head *bh) > +{ > + set_buffer_write_io_error(bh); > + /* FIXME: do we need to set this in both places? */ > + if (bh->b_page && bh->b_page->mapping) > + mapping_set_error(bh->b_page->mapping, -EIO); > + if (bh->b_assoc_map) > + mapping_set_error(bh->b_assoc_map, -EIO); > +} > +EXPORT_SYMBOL(mark_buffer_write_io_error); > + > /* > * Decrement a buffer_head's reference count. If all buffers against a page > * have zero reference count, are clean and unlocked, and if the page is > clean > @@ -3266,8 +3274,6 @@ drop_buffers(struct page *page, struct buffer_head > **buffers_to_free) > > bh = head; > do { > - if (buffer_write_io_error(bh) && page->mapping) > - mapping_set_error(page->mapping, -EIO); > if (buffer_busy(bh)) > goto failed; > bh = bh->b_this_page; > diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c > index 885d36e7a29f..1a9c2c08c1a1 100644 > --- a/fs/gfs2/lops.c > +++ b/fs/gfs2/lops.c > @@ -182,7 +182,7 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp, > struct bio_vec *bvec, > bh = bh->b_this_page; > do { > if (error) > - set_buffer_write_io_error(bh); > + mark_buffer_write_io_error(bh); > unlock_buffer(bh); > next = bh->b_this_page; > size -= bh->b_size; > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > index bd029e52ef5e..e0abeba3ced7 100644 > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -149,6 +149,7 @@ void buffer_check_dirty_writeback(struct page *page, > */ > > void mark_buffer_dirty(struct buffer_head *bh); > +void mark_buffer_write_io_error(struct buffer_head *bh); > void init_buffer(struct buffer_head *, bh_end_io_t *, void *); > void touch_buffer(struct buffer_head *bh); > void set_bh_page(struct buffer_head *bh, > -- > 2.13.0 > -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/22] fs: remove call_fsync helper function
On Fri, Jun 16, 2017 at 03:34:06PM -0400, Jeff Layton wrote: > Requested-by: Christoph Hellwig> Signed-off-by: Jeff Layton > --- > fs/sync.c | 2 +- > include/linux/fs.h | 6 -- > ipc/shm.c | 2 +- > 3 files changed, 2 insertions(+), 8 deletions(-) > > 2.13.0 If it's worth to have one more reviewer, you can add: Reviewed-by: Carlos Maiolino > -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html