date:20170626

Re: [PATCH v2] btrfs-progs: lowmem check: Fix false alert about file extent interrupt

2017-06-26 Thread Lu Fengqi

On Mon, Jun 26, 2017 at 04:55:04PM +0200, David Sterba wrote:
>On Thu, Jun 22, 2017 at 04:12:56PM +0800, Lu Fengqi wrote:
>> As Qu mentioned in this thread
>> (https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression
>> can cause regular extent to co-exist with inlined extent. This coexistence
>> makes things confusing. Since it was permitted currently, so fix
>> btrfsck to prevent a bunch of error logs that will make user feel
>> panic.
>> 
>> When check file extent, record the extent_end of regular extent to check
>> if there is a gap between the regular extents. Normally there is only one
>> inlined extent, so the extent_end of inlined extent is useless. However,
>> if regular extent can co-exist with inlined extent, the extent_end of
>> inlined extent also need to record.
>> 
>> Reported-by: Marc MERLIN 
>> Signed-off-by: Lu Fengqi 
>
>Applied, thanks.
>
>Do you have a test for that?

Yes, I have already posted this testcase
(https://www.spinics.net/lists/linux-btrfs/msg66802.html) yesterday. In
addition, this patch has an updated version
(https://www.spinics.net/lists/linux-btrfs/msg66803.html) which make
lowmem mode output more detailed information when file extent interrupt.
Since the patch v2 has been applied, then I will send a patch for this
modification alone.

-- 
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] lib/zstd: use div_u64() to let it build on 32-bit

2017-06-26 Thread Nick Terrell

Adam, I’ve applied the same patch in my tree. I’ll send out the update [1]
once it's reviewed, since I also reduced the stack usage of functions
using over 1 KB of stack space.

You’re right that div_u64() will work, since the FSE functions are only
called on blocks of at most 128 KB at a time. Perhaps a u32 would be
clearer, but I would prefer to leave the signatures as is, to stay closer
to upstream. Upstream FSE should work with sizes larger than 4 GB, but
since it can't happen in zstd, it isn't a priority.

I have userland tests set up mocking the linux kernel headers, and tested
32-bit mode there, but neglected to test the kernel on a 32-bit VM, which
I’ve now corrected. Thanks for testing the patch on your ARM machine!

[1] https://github.com/facebook/zstd/pull/738/files

On 6/26/17, 9:18 PM, "Adam Borowski"  wrote:

David Sterba wrote:
> > Thus, you want do_div() instead of /; do check widths and signedness of
> > arguments.
>
> No do_div please, div_u64 or div64_u64.

Good to know, the interface of do_div() is indeed weird.

I guess Nick has found and fixed the offending divisions in his tree
already, but this patch I'm sending is what I'm testing.

One thing to note is that it divides u64 by size_t, so the actual operation
differs on 32 vs 64-bit.  Yet the code fails to handle compressing pieces
bigger than 4GB in other places -- so use of size_t is misleading.  Perhaps
u32 would better convey this limitation?

Anyway, that this code didn't even compile on 32-bit also means it hasn't
been tested.  I just happen to have such an ARM machine doing Debian archive
rebuilds; I've rewritten the chroots with compress=zstd; this should be a
nice non-artificial test.  The load consists of snapshot+dpkg+gcc/etc+
assorted testsuites, two sbuild instances.  Seems to work fine for a whole
hour (yay!) already, let's see if there'll be any explosions.

-- >8  >8  >8  >8  >8  >8  >8  >8  >8 --
Note that "total" is limited to 2³²-1 elsewhere despite being declared
as size_t, so it's ok to use 64/32 -- it's much faster on eg. x86-32
than 64/64.

Signed-off-by: Adam Borowski 
---
 lib/zstd/fse_compress.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/zstd/fse_compress.c b/lib/zstd/fse_compress.c
index e016bb177833..f59f9ebfe9c0 100644
--- a/lib/zstd/fse_compress.c
+++ b/lib/zstd/fse_compress.c
@@ -49,6 +49,7 @@
 #include "fse.h"
 #include 
 #include  /* memcpy, memset */
+#include 

 /* **
 *  Error Management
@@ -575,7 +576,7 @@ static size_t FSE_normalizeM2(short *norm, U32 
tableLog, const unsigned *count,
{
U64 const vStepLog = 62 - tableLog;
U64 const mid = (1ULL << (vStepLog - 1)) - 1;
-   U64 const rStep = U64)1 << vStepLog) * ToDistribute) + mid) 
/ total; /* scale on remaining */
+   U64 const rStep = div_u64U64)1 << vStepLog) * ToDistribute) 
+ mid, total); /* scale on remaining */
U64 tmpTotal = mid;
for (s = 0; s <= maxSymbolValue; s++) {
if (norm[s] == NOT_YET_ASSIGNED) {
@@ -609,7 +610,7 @@ size_t FSE_normalizeCount(short *normalizedCounter, 
unsigned tableLog, const uns
{
U32 const rtbTable[] = {0, 473195, 504333, 520860, 55, 
70, 75, 83};
U64 const scale = 62 - tableLog;
-   U64 const step = ((U64)1 << 62) / total; /* <== here, one 
division ! */
+   U64 const step = div_u64((U64)1 << 62, total); /* <== here, one 
division ! */
U64 const vStep = 1ULL << (scale - 20);
int stillToDistribute = 1 << tableLog;
unsigned s;
-- 
2.13.1

[PATCH] lib/zstd: use div_u64() to let it build on 32-bit

2017-06-26 Thread Adam Borowski

David Sterba wrote:
> > Thus, you want do_div() instead of /; do check widths and signedness of
> > arguments.
>
> No do_div please, div_u64 or div64_u64.

Good to know, the interface of do_div() is indeed weird.

I guess Nick has found and fixed the offending divisions in his tree
already, but this patch I'm sending is what I'm testing.

One thing to note is that it divides u64 by size_t, so the actual operation
differs on 32 vs 64-bit.  Yet the code fails to handle compressing pieces
bigger than 4GB in other places -- so use of size_t is misleading.  Perhaps
u32 would better convey this limitation?

Anyway, that this code didn't even compile on 32-bit also means it hasn't
been tested.  I just happen to have such an ARM machine doing Debian archive
rebuilds; I've rewritten the chroots with compress=zstd; this should be a
nice non-artificial test.  The load consists of snapshot+dpkg+gcc/etc+
assorted testsuites, two sbuild instances.  Seems to work fine for a whole
hour (yay!) already, let's see if there'll be any explosions.

-- >8  >8  >8  >8  >8  >8  >8  >8  >8 --
Note that "total" is limited to 2³²-1 elsewhere despite being declared
as size_t, so it's ok to use 64/32 -- it's much faster on eg. x86-32
than 64/64.

Signed-off-by: Adam Borowski 
---
 lib/zstd/fse_compress.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/zstd/fse_compress.c b/lib/zstd/fse_compress.c
index e016bb177833..f59f9ebfe9c0 100644
--- a/lib/zstd/fse_compress.c
+++ b/lib/zstd/fse_compress.c
@@ -49,6 +49,7 @@
 #include "fse.h"
 #include 
 #include  /* memcpy, memset */
+#include 

 /* **
 *  Error Management
@@ -575,7 +576,7 @@ static size_t FSE_normalizeM2(short *norm, U32 tableLog, 
const unsigned *count,
{
U64 const vStepLog = 62 - tableLog;
U64 const mid = (1ULL << (vStepLog - 1)) - 1;
-   U64 const rStep = U64)1 << vStepLog) * ToDistribute) + mid) 
/ total; /* scale on remaining */
+   U64 const rStep = div_u64U64)1 << vStepLog) * ToDistribute) 
+ mid, total); /* scale on remaining */
U64 tmpTotal = mid;
for (s = 0; s <= maxSymbolValue; s++) {
if (norm[s] == NOT_YET_ASSIGNED) {
@@ -609,7 +610,7 @@ size_t FSE_normalizeCount(short *normalizedCounter, 
unsigned tableLog, const uns
{
U32 const rtbTable[] = {0, 473195, 504333, 520860, 55, 
70, 75, 83};
U64 const scale = 62 - tableLog;
-   U64 const step = ((U64)1 << 62) / total; /* <== here, one 
division ! */
+   U64 const step = div_u64((U64)1 << 62, total); /* <== here, one 
division ! */
U64 const vStep = 1ULL << (scale - 20);
int stillToDistribute = 1 << tableLog;
unsigned s;
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] btrfs-progs: btrfs-convert: Add larger device support

2017-06-26 Thread Lakshmipathi.G

> > -   u32 free_inodes_count;
> > +   u64 first_data_block;
> > +   u64 block_count;
> > +   u64 inodes_count;
> > +   u64 free_inodes_count;
> 
> I've split this change from the patch as it does not logically belong to
> the same patch, altough the change is simple.

Okay sure, thanks. 

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3.1 0/7] Chunk level degradable check

2017-06-26 Thread Qu Wenruo




At 06/27/2017 09:59 AM, Anand Jain wrote:



On 06/27/2017 09:05 AM, Qu Wenruo wrote:



At 06/27/2017 02:59 AM, David Sterba wrote:

On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
  # wipefs -f /dev/sdc
  # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel
can know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".


I'd like to get this merged to 4.14. The flush bio changes are now done,
so the base code should be stable. I've read the previous iterations of
this patchset, the comments and user feedback. The usecase coverage
seems to be good and what users expect.


Thank you for the kindly remind.



There are some bits in the implementation that I do not like, eg.
reintroducing memory allocation failure to the barrier check, but IIRC
no fundamental problems. Please refresh the patchset on top of current
code that's going to 4.13 (equvalent to the current for-next), I'll
review that and comment. One or more iterations might be needed, but
4.14 target is within reach.


I'll check the new flush infrastructure and figure out if we can avoid 
re-introducing such memory allocation failure with the new 
infrastructure.


  As this is going to address the raid1 availability issue, its better to
  mark this for the stable. IMO. But I wonder if there is any objection ?


Not sure if stable maintainers (even normal subsystem maintainers) will 
like it, as it's quite a large modification, including dev flush 
infrastructure.


But since v4.14 will be an LTS kernel, we don't need to rush too much to 
push this feature to stable, as long as the feature is planned to reach 
v4.14.


Thanks,
Qu



Thanks, -Anand





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent

2017-06-26 Thread Qu Wenruo




At 06/27/2017 02:02 AM, Liu Bo wrote:

On Mon, Jun 26, 2017 at 04:09:53PM +0200, David Sterba wrote:

On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote:

From: Liu Bo 


Ah, my From was broken again.



%search_start is calculated in a wrong way, and if %ins is a cross-stripe
  one, it'll search the same block group forever.


That's a bit terse description, so please check if my understanding is right:
search_start advances by at least one stripe len, but the math would be wrong
as using bg_offset would not move us to the next stripe. bg_cache->key.objectid
is the full length so this will reach the next stripe and will not loop forever.


Yes, it's correct, the code's logic is like, now that the returned %ins is a
cross-stripe one, it then calculates a BTRFS_STRIPE_LEN aligned one as the new
%search_start and see if there is any free block matching %search_start.  The
current code is using a wrong offset, the offset really should be the start
position of a block group.



Do you happen to have a test for that?


Unfortunately it's not a test with vanilla progs.

I found this when mkfs.btrfs with a 12K nodesize, but now kernel has a
power_of_2 limitation for nodesize and progs code is using a weird IS_ALIGNED()


Yes, btrfs_check_nodesize() is using (nodesize & (sectorsize - 1)) to 
check if it's aligned, but it's only correct if sectorsize is power of 2.


It should also be fixed for btrfs-progs.

Thanks,
Qu


which has the same effect with power_of_2(), mkfs.btrfs -n 12K is not allowed.
I changed IS_ALIGNED() to (blocksize % nodesize != 0) and got the above loop.




Signed-off-by: Liu Bo 
---
  extent-tree.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index b12ee29..5e09274 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2614,8 +2614,9 @@ check_failed:
goto no_bg_cache;
bg_offset = ins->objectid - bg_cache->key.objectid;
  
-			search_start = round_up(bg_offset + num_bytes,

-   BTRFS_STRIPE_LEN) + bg_offset;
+   search_start = round_up(
+   bg_offset + num_bytes, BTRFS_STRIPE_LEN) +
+   bg_cache->key.object;


extent-tree.c: In function ‘find_free_extent’:
extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named ‘object’; 
did you mean ‘objectid’?
  bg_cache->key.object;
   ^


Ouch, that's right, it's %objectid.

I'll send a updated one, thanks for the comments.

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3.1 0/7] Chunk level degradable check

2017-06-26 Thread Anand Jain




On 06/27/2017 09:05 AM, Qu Wenruo wrote:



At 06/27/2017 02:59 AM, David Sterba wrote:

On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
  # wipefs -f /dev/sdc
  # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel
can know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".


I'd like to get this merged to 4.14. The flush bio changes are now done,
so the base code should be stable. I've read the previous iterations of
this patchset, the comments and user feedback. The usecase coverage
seems to be good and what users expect.


Thank you for the kindly remind.



There are some bits in the implementation that I do not like, eg.
reintroducing memory allocation failure to the barrier check, but IIRC
no fundamental problems. Please refresh the patchset on top of current
code that's going to 4.13 (equvalent to the current for-next), I'll
review that and comment. One or more iterations might be needed, but
4.14 target is within reach.


I'll check the new flush infrastructure and figure out if we can avoid 
re-introducing such memory allocation failure with the new infrastructure.


 As this is going to address the raid1 availability issue, its better to
 mark this for the stable. IMO. But I wonder if there is any objection ?

Thanks, -Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: convert: do not clear header rev

2017-06-26 Thread Qu Wenruo




At 06/27/2017 07:55 AM, Liu Bo wrote:

So btrfs_set_header_flags() vs btrfs_set_header_flag, the difference is sort of
similar to "=" vs "|=", when creating and initialising a new extent buffer,
convert uses the former one which clears header_rev by accident.


Thanks for catching this one.

Reviewed-by: Qu Wenruo 

Thanks,
Qu


Signed-off-by: Liu Bo 
---
  convert/common.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/convert/common.c b/convert/common.c
index 40bf32c..f0dd2cf 100644
--- a/convert/common.c
+++ b/convert/common.c
@@ -167,7 +167,7 @@ static int setup_temp_extent_buffer(struct extent_buffer 
*buf,
btrfs_set_header_generation(buf, 1);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
-   btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN);
+   btrfs_set_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN);
write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf),
BTRFS_UUID_SIZE);
write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3.1 0/7] Chunk level degradable check

2017-06-26 Thread Qu Wenruo




At 06/27/2017 02:59 AM, David Sterba wrote:

On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
  # wipefs -f /dev/sdc
  # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel
can know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".


I'd like to get this merged to 4.14. The flush bio changes are now done,
so the base code should be stable. I've read the previous iterations of
this patchset, the comments and user feedback. The usecase coverage
seems to be good and what users expect.


Thank you for the kindly remind.



There are some bits in the implementation that I do not like, eg.
reintroducing memory allocation failure to the barrier check, but IIRC
no fundamental problems. Please refresh the patchset on top of current
code that's going to 4.13 (equvalent to the current for-next), I'll
review that and comment. One or more iterations might be needed, but
4.14 target is within reach.


I'll check the new flush infrastructure and figure out if we can avoid 
re-introducing such memory allocation failure with the new infrastructure.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs-progs: convert: do not clear header rev

2017-06-26 Thread Liu Bo

So btrfs_set_header_flags() vs btrfs_set_header_flag, the difference is sort of
similar to "=" vs "|=", when creating and initialising a new extent buffer,
convert uses the former one which clears header_rev by accident.

Signed-off-by: Liu Bo 
---
 convert/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/convert/common.c b/convert/common.c
index 40bf32c..f0dd2cf 100644
--- a/convert/common.c
+++ b/convert/common.c
@@ -167,7 +167,7 @@ static int setup_temp_extent_buffer(struct extent_buffer 
*buf,
btrfs_set_header_generation(buf, 1);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
-   btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN);
+   btrfs_set_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN);
write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf),
BTRFS_UUID_SIZE);
write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 0/2] Btrfs: add compression heuristic

2017-06-26 Thread Timofey Titovets

Today btrfs use simple logic to make decision
compress data or not:
Selected compression algorithm try compress
data and if this save some space
store that extent as compressed.

It's Reliable way to detect uncompressible data
but it's will waste/burn cpu time for
bad/un-compressible data and add latency.

This way also add additional pressure on
memory subsystem as for every compressed write
btrfs need to allocate some buffered pages and
reuse compression workspace.

This is quite efficient, but not free.

So, try create basic heuristic framework,
this heuristic code will analizy data on the fly
before call of compression code,
can detect uncompressible data and advice to skip it.

I leave comments with description in code,
but i also will try describe that logic short.
Heuristic have several internal layers:
1. Get sample data - this is cpu expensive
   to analize whole stream, so let's get some
   big enough sample from input data
   Scaling:
   In data: 128K  64K   32K   4K
   Sample:  4096b 3072b 2048b 1024b

2. For performance reason and for reuse it in 7th level
   copy selected data to sample buffer
3. Count every byte type in sample buffer
4. Count how many types of bytes we find
   If it's not many - data will be easy compressible
5. Count character core set size, i.e.
   which characters use 90% of input stream
   If core set small (1-50 different types)
   Data easy compressible
   If big (200-256) - data probably can't be compressed
6. If above methods are fail to make decision,
   try compute shannon entropy
   If entropy are small - data will be easy compressible
   If not - go to 7th
7. Entropy can't detect repeated strings of bytes
   So try look at the data for detect repeated bytes
   Compute a difference between frequency of bytes from
   coreset and between frequency of pair of that bytes
   If sum of that defferent from zero and entropy and not
   big, give compression code a try
   If entropy are High 7.2/8 - 8/8 (> 90%), and if we find BIG enough
   difference between frequency of a pairs and characters
   Give compression code a try

   7th level needed for decreasing false negative returns,
   where data can be compressed (like ~131072b -> ~87000b ~ 0.66),
   but not so easy.

That code, as i see, forbidden compression like:
- 131072b -> ~11b
If compression ratio are better, it's allow that.

Shannon entropy use log2(a/b) function,
I did a try replace that with int_log2(a)-int_log2(b), but
integer realization of log2 show a lack of accuracy (+-7-10%) in our case.
So i precalculate some input/output values (1/131072 - 1/1) and create 
log2_lshift16();
I already decrease lines of that function from 1200 -> 200
for save memory (and lose some accuracy), so with precomputed function
I get +- 0.5-2% of accuracy (in compare to normal "true" float log2 shannon)

Thanks.

Patches based on latest mainline: v4.12-rc7

P.S.
I made only stability tests at now, all works stable.
About performance:
In userspace realization of that algorithm, which
iterate over data by 128kb block and do Mmap() of file, it
show ~4GiB/s over in memory (cached) data in one stream.
For i5-4200M && DDR3.

So i expect to not hurt compression performance.

I've also duplicate patch set to:
https://github.com/Nefelim4ag/linux

log2_lshift() - tested by log2_generator
https://github.com/Nefelim4ag/Entropy_Calculation

P.S.S.
Sorry for my bad english and may be for ugly code.
I do my best, thanks.


Changes since v1:
  - Fixes of checkpatch.pl warnings/errors
  - Use div64_u64() instead of "/"
  - Make log2_lshift16() more like binary tree as suggested by:
Adam Borowski 

Changes since v2:
  - Fix page read address overflow in heuristic.c
  - Make "bucket" dynamically allocated, for fix warnings about big stack.
  - Small cleanups

Timofey Titovets (2):
  Btrfs: add precomputed log2()
  Btrfs: add heuristic method for make decision compress or not compress

 fs/btrfs/Makefile|   2 +-
 fs/btrfs/heuristic.c | 275 ++
 fs/btrfs/heuristic.h |  13 +++
 fs/btrfs/inode.c |  37 ---
 fs/btrfs/log2_lshift16.c | 278 +++
 fs/btrfs/log2_lshift16.h |  11 ++
 6 files changed, 601 insertions(+), 15 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c
 create mode 100644 fs/btrfs/heuristic.h
 create mode 100644 fs/btrfs/log2_lshift16.c
 create mode 100644 fs/btrfs/log2_lshift16.h

--
2.13.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 1/2] Btrfs: add precomputed log2()

2017-06-26 Thread Timofey Titovets

Heuristic code compute shannon entropy in cases when
other methods can't make clear decision
For realization that calculation it's needs floating point,
but as this doesn't possible to use floating point,
lets just precalculate all our input/output values

Signed-off-by: Timofey Titovets 
---
 fs/btrfs/log2_lshift16.c | 278 +++
 fs/btrfs/log2_lshift16.h |  11 ++
 2 files changed, 289 insertions(+)
 create mode 100644 fs/btrfs/log2_lshift16.c
 create mode 100644 fs/btrfs/log2_lshift16.h

diff --git a/fs/btrfs/log2_lshift16.c b/fs/btrfs/log2_lshift16.c
new file mode 100644
index ..0d5d414b2adf
--- /dev/null
+++ b/fs/btrfs/log2_lshift16.c
@@ -0,0 +1,278 @@
+#include 
+#include "log2_lshift16.h"
+
+/*
+ * Precalculated log2 values
+ * Shifting used for avoiding floating point
+ * Fraction must be left shifted by 16
+ * Return of log are left shifted by 3
+ */
+int log2_lshift16(u64 lshift16)
+{
+   if (lshift16 < 558) {
+   if (lshift16 < 54) {
+   if (lshift16 < 13) {
+   if (lshift16 < 7) {
+   if (lshift16 < 1)
+   return -136;
+   if (lshift16 < 2)
+   return -123;
+   if (lshift16 < 3)
+   return -117;
+   if (lshift16 < 4)
+   return -113;
+   if (lshift16 < 5)
+   return -110;
+   if (lshift16 < 6)
+   return -108;
+   if (lshift16 < 7)
+   return -106;
+   } else {
+   if (lshift16 < 8)
+   return -104;
+   if (lshift16 < 9)
+   return -103;
+   if (lshift16 < 10)
+   return -102;
+   if (lshift16 < 11)
+   return -100;
+   if (lshift16 < 12)
+   return -99;
+   if (lshift16 < 13)
+   return -98;
+   }
+   } else {
+   if (lshift16 < 29) {
+   if (lshift16 < 15)
+   return -97;
+   if (lshift16 < 16)
+   return -96;
+   if (lshift16 < 17)
+   return -95;
+   if (lshift16 < 19)
+   return -94;
+   if (lshift16 < 21)
+   return -93;
+   if (lshift16 < 23)
+   return -92;
+   if (lshift16 < 25)
+   return -91;
+   if (lshift16 < 27)
+   return -90;
+   if (lshift16 < 29)
+   return -89;
+   } else {
+   if (lshift16 < 32)
+   return -88;
+   if (lshift16 < 35)
+   return -87;
+   if (lshift16 < 38)
+   return -86;
+   if (lshift16 < 41)
+   return -85;
+   if (lshift16 < 45)
+   return -84;
+   if (lshift16 < 49)
+   return -83;
+   if (lshift16 < 54)
+   return -82;
+   }
+   }
+   } else {
+   if (lshift16 < 181) {
+   if (lshift16 < 99) {
+

[RFC PATCH v3 2/2] Btrfs: add heuristic method for make decision compress or not compress

2017-06-26 Thread Timofey Titovets

Add a heuristic computation before compression,
for avoiding load resource heavy compression workspace,
if data are probably can't be compressed.

Signed-off-by: Timofey Titovets 
---
 fs/btrfs/Makefile|   2 +-
 fs/btrfs/heuristic.c | 275 +++
 fs/btrfs/heuristic.h |  13 +++
 fs/btrfs/inode.c |  37 ---
 4 files changed, 312 insertions(+), 15 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c
 create mode 100644 fs/btrfs/heuristic.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17a80b0..8386095c9032 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-  uuid-tree.o props.o hash.o free-space-tree.o
+  uuid-tree.o props.o hash.o free-space-tree.o heuristic.o 
log2_lshift16.o

 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
new file mode 100644
index ..cac6f0917b59
--- /dev/null
+++ b/fs/btrfs/heuristic.c
@@ -0,0 +1,275 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "heuristic.h"
+/* Precalculated log2 realization */
+#include "log2_lshift16.h"
+
+/* For shannon full integer entropy calculation */
+#define BUCKET_SIZE (1 << 8)
+
+struct _backet_item {
+   u8  padding;
+   u8  symbol;
+   u16 count;
+};
+
+
+/* For sorting */
+static int compare(const void *lhs, const void *rhs)
+{
+   struct _backet_item *l = (struct _backet_item *)(lhs);
+   struct _backet_item *r = (struct _backet_item *)(rhs);
+
+   return r->count - l->count;
+}
+
+/*
+ * For good compressible data
+ * symbol set size over sample
+ * will be small <= 64
+ */
+static u32 _symbset_calc(const struct _backet_item *bucket)
+{
+   u32 a = 0;
+   u32 symbset_size = 0;
+
+   for (; a < BUCKET_SIZE && symbset_size <= 64; a++) {
+   if (bucket[a].count)
+   symbset_size++;
+   }
+   return symbset_size;
+}
+
+
+/*
+ * Try calculate coreset size
+ * i.e. how many symbols use 90% of input data
+ * < 50 - good compressible data
+ * > 200 - bad compressible data
+ * For right & fast calculation bucket must be reverse sorted
+ */
+static u32 _coreset_calc(const struct _backet_item *bucket,
+   const u32 sum_threshold)
+{
+   u32 a = 0;
+   u32 coreset_sum = 0;
+
+   for (a = 0; a < 201 && bucket[a].count; a++) {
+   coreset_sum += bucket[a].count;
+   if (coreset_sum > sum_threshold)
+   break;
+   }
+   return a;
+}
+
+static u64 _entropy_perc(const struct _backet_item *bucket,
+   const u32 sample_size)
+{
+   u64 a, p;
+   u64 entropy_sum = 0;
+   u64 entropy_max = LOG2_RET_SHIFT*8;
+
+   for (a = 0; a < BUCKET_SIZE && bucket[a].count > 0; a++) {
+   p = bucket[a].count;
+   p = div64_u64(p*LOG2_ARG_SHIFT, sample_size);
+   entropy_sum += -p*log2_lshift16(p);
+   }
+
+   entropy_sum = div64_u64(entropy_sum, LOG2_ARG_SHIFT);
+   return div64_u64(entropy_sum*100, entropy_max);
+}
+
+/* Pair distance from random distribution */
+static u64 _random_pairs_distribution(const struct _backet_item *bucket,
+   const u32 coreset_size, const u8 *sample, u32 sample_size)
+{
+   u32 a, b;
+   u8 pair_a[2], pair_b[2];
+   u32 pairs_count;
+   u64 sum = 0;
+   u64 buf1, buf2;
+
+   for (a = 0; a < coreset_size-1; a++) {
+   pairs_count = 0;
+   pair_a[0] = bucket[a].symbol;
+   pair_a[1] = bucket[a+1].symbol;
+   pair_b[1] = bucket[a].symbol;
+   pair_b[0] = bucket[a+1].symbol;
+   for (b = 0; b < sample_size-1; b++) {
+   u16 *pair_c = (u16 *) [b];
+
+   if (pair_c == (u16 *) pair_a)
+   pairs_count++;
+   else if (pair_c == (u16 *) pair_b)
+   pairs_count++;
+   }
+   buf1 = bucket[a].count*bucket[a+1].count;
+   buf1 = div64_u64(buf1*10, (sample_size*sample_size));
+   buf2 = pairs_count*2*10;
+   buf2 = div64_u64(pairs_count, sample_size);
+   sum += (buf1 - buf2)*(buf1 - buf2);
+   }
+
+   return div64_u64(sum, 2048);
+}
+
+/*
+ * Algorithm description
+ * 1. Get subset of data for fast computation
+ * 2. Scan bucket for symbol set
+ *- symbol set < 64 - data will be easy compressible, return
+ * 3. Try compute coreset size

Re: [PATCH v3.1 0/7] Chunk level degradable check

2017-06-26 Thread David Sterba

On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
> 
> Although the one-size-fit-all solution is quite safe, it's too strict
> if data and metadata has different duplication level.
> 
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded
> mounted.
> 
> But in fact, some times all single chunks may be in the existing
> device and in that case, we should allow it to be rw degraded mounted.
> 
> Such case can be easily reproduced using the following script:
>  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>  # wipefs -f /dev/sdc
>  # mount /dev/sdb -o degraded,rw
> 
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
> 
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.
> 
> And enhance kernel error message for missing device, at least kernel
> can know what's making mount failed, other than meaningless
> "failed to read system chunk/chunk tree -5".

I'd like to get this merged to 4.14. The flush bio changes are now done,
so the base code should be stable. I've read the previous iterations of
this patchset, the comments and user feedback. The usecase coverage
seems to be good and what users expect.

There are some bits in the implementation that I do not like, eg.
reintroducing memory allocation failure to the barrier check, but IIRC
no fundamental problems. Please refresh the patchset on top of current
code that's going to 4.13 (equvalent to the current for-next), I'll
review that and comment. One or more iterations might be needed, but
4.14 target is within reach.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch v2] Btrfs-progs: fix infinite loop in find_free_extent

2017-06-26 Thread Liu Bo

If the found %ins is crossing a stripe len, ie. BTRFS_STRIPE_LEN, we'd
search again with a stripe-aligned %search_start.  The current code
calculates %search_start by adding a wrong offset, in order to fix it, the
start position of the block group should be taken, otherwise, it'll end up
with looking at the same block group forever.

Cc: David Sterba 
Signed-off-by: Liu Bo 
---
v2: - enhance commit log with more details.
- fix typo on bg_cache->key.objectid.

 extent-tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 3e32e43..2c73d46 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2614,8 +2614,9 @@ check_failed:
goto no_bg_cache;
bg_offset = ins->objectid - bg_cache->key.objectid;
 
-   search_start = round_up(bg_offset + num_bytes,
-   BTRFS_STRIPE_LEN) + bg_offset;
+   search_start = round_up(
+   bg_offset + num_bytes, BTRFS_STRIPE_LEN) +
+   bg_cache->key.objectid;
goto new_group;
}
 no_bg_cache:
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Apply for a loan at 3%

2017-06-26 Thread haiplt

Apply for a loan at 3% reply to this Email for more Info
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: incremental send, fix invalid path for link commands

2017-06-26 Thread fdmanana

From: Filipe Manana 

In some scenarios an incremental send stream can contain link commands
with an invalid target path. Such scenarios happen after moving some
directory inode A, renaming a regular file inode B into the old name of
inode A and finally creating a new hard link for inode B at directory
inode A.

Consider the following example scenario where this issue happens.

Parent snapshot:

  .  (ino 256)
  |
  |--- dir1/ (ino 257)
  |  |--- dir2/  (ino 258)
  | |--- dir3/   (ino 259)
  |   |--- file1 (ino 261)
  |   |--- dir4/ (ino 262)
  |
  |--- dir5/ (ino 260)

Send snapshot:

  .  (ino 256)
  |
  |--- dir1/ (ino 257)
 |--- dir2/  (ino 258)
 |  |--- dir3/   (ino 259)
 ||--- dir4  (ino 261)
 |
 |--- dir6/  (ino 263)
|--- dir44/  (ino 262)
   |--- file11   (ino 261)
   |--- dir55/   (ino 260)

When attempting to apply the corresponding incremental send stream, a
link command contains an invalid target path which makes the receiver
fail. The following is the verbose output of the btrfs receive command:

  receiving snapshot mysnap2 uuid=90076fe6-5ba6-e64a-9321-9279670ed16b (...)
  utimes
  utimes dir1
  utimes dir1/dir2/dir3
  utimes
  rename dir1/dir2/dir3/dir4 -> o262-7-0
  link dir1/dir2/dir3/dir4 -> dir1/dir2/dir3/file1
  link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1
  ERROR: link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1 failed: Not a 
directory

The following steps happen during the computation of the incremental send
stream the lead to this issue:

1) When processing inode 261, we orphanize inode 262 due to a name/location
   collision with one of the new hard links for inode 261 (created in the
   second step below).

2) We create one of the 2 new hard links for inode 261, the one whose
   location is at "dir1/dir2/dir3/dir4".

3) We then attempt to create the other new hard link for inode 261, which
   has inode 262 as its parent directory. Because the path for this new
   hard link was computed before we started processing the new references
   (hard links), it reflects the old name/location of inode 262, that is,
   it does not account for the orphanization step that happened when
   we started processing the new references for inode 261, whence it is
   no longer valid, causing the receiver to fail.

So fix this issue by recomputing the full path of new references if we
ended up orphanizing other inodes which are directories.

A test case for fstests follows soon.

Signed-off-by: Filipe Manana 
---

Applies on top of previous patches:

  Btrfs: send, fix invalid path after renaming and linking file
  Btrfs: incremental send, fix invalid path for unlink commands

 fs/btrfs/send.c | 81 -
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e937c10b8287..7eaccfb72b47 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1856,7 +1856,7 @@ static int is_first_ref(struct btrfs_root *root,
  */
 static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
  const char *name, int name_len,
- u64 *who_ino, u64 *who_gen)
+ u64 *who_ino, u64 *who_gen, u64 *who_mode)
 {
int ret = 0;
u64 gen;
@@ -1905,7 +1905,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 
dir, u64 dir_gen,
if (other_inode > sctx->send_progress ||
is_waiting_for_move(sctx, other_inode)) {
ret = get_inode_info(sctx->parent_root, other_inode, NULL,
-   who_gen, NULL, NULL, NULL, NULL);
+   who_gen, who_mode, NULL, NULL, NULL);
if (ret < 0)
goto out;
 
@@ -3683,6 +3683,36 @@ static int wait_for_parent_move(struct send_ctx *sctx,
return ret;
 }
 
+static int update_ref_path(struct send_ctx *sctx, struct recorded_ref *ref)
+{
+   int ret;
+   struct fs_path *new_path;
+
+   /*
+* Our reference's name member points to its full_path member string, so
+* we use here a new path.
+*/
+   new_path = fs_path_alloc();
+   if (!new_path)
+   return -ENOMEM;
+
+   ret =

[PATCH] btrfs: test incremental send after replacing directory with a file

2017-06-26 Thread fdmanana

From: Filipe Manana 

Test that an incremental send/receive operation works correctly after
moving some directory inode A, renaming a regular file inode B into the
old name of inode A and finally creating a new hard link for inode B at
directory inode A.

This issue is fixed by the following patch for the linux kernel:

  "Btrfs: incremental send, fix invalid path for link commands"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/147 | 130 
 tests/btrfs/147.out |   6 +++
 tests/btrfs/group   |   1 +
 3 files changed, 137 insertions(+)
 create mode 100755 tests/btrfs/147
 create mode 100644 tests/btrfs/147.out

diff --git a/tests/btrfs/147 b/tests/btrfs/147
new file mode 100755
index ..15517b0c
--- /dev/null
+++ b/tests/btrfs/147
@@ -0,0 +1,130 @@
+#! /bin/bash
+# FS QA Test No. btrfs/147
+#
+# Test that an incremental send/receive operation works correctly after moving
+# some directory inode A, renaming a regular file inode B into the old name of
+# inode A and finally creating a new hard link for inode B at directory inode 
A.
+#
+#---
+#
+# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -fr $send_files_dir
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_test
+_require_scratch
+_require_fssum
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+mkdir $SCRATCH_MNT/dir1
+mkdir $SCRATCH_MNT/dir1/dir2
+mkdir $SCRATCH_MNT/dir1/dir2/dir3
+mkdir $SCRATCH_MNT/dir5
+touch $SCRATCH_MNT/dir1/dir2/dir3/file1
+mkdir $SCRATCH_MNT/dir1/dir2/dir3/dir4
+
+# Filesystem looks like:
+#
+# .  (ino 256)
+# |
+# |--- dir1/ (ino 257)
+# |  |--- dir2/  (ino 258)
+# | |--- dir3/   (ino 259)
+# |   |--- file1 (ino 261)
+# |   |--- dir4/ (ino 262)
+# |
+# |--- dir5/ (ino 260)
+#
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/mysnap1 > /dev/null
+
+$BTRFS_UTIL_PROG send -f $send_files_dir/1.snap \
+   $SCRATCH_MNT/mysnap1 2>&1 1>/dev/null | _filter_scratch
+
+mkdir $SCRATCH_MNT/dir1/dir6
+mv $SCRATCH_MNT/dir5 $SCRATCH_MNT/dir1/dir2/dir3/dir4/dir55
+ln $SCRATCH_MNT/dir1/dir2/dir3/file1 $SCRATCH_MNT/dir1/dir2/dir3/dir4/file11
+mv $SCRATCH_MNT/dir1/dir2/dir3/dir4 $SCRATCH_MNT/dir1/dir6/dir44
+mv $SCRATCH_MNT/dir1/dir2/dir3/file1 $SCRATCH_MNT/dir1/dir2/dir3/dir4
+
+# Filesystem now looks like:
+#
+# .  (ino 256)
+# |
+# |--- dir1/ (ino 257)
+#|--- dir2/  (ino 258)
+#|  |--- dir3/   (ino 259)
+#||--- dir4  (ino 261)
+#|
+#|--- dir6/  (ino 263)
+#   |--- dir44/  (ino 262)
+#  |--- file11   (ino 261)
+#  |--- dir55/   (ino 260)
+#
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+$SCRATCH_MNT/mysnap2 > /dev/null
+
+$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 -f $send_files_dir/2.snap \
+$SCRATCH_MNT/mysnap2 2>&1 1>/dev/null | _filter_scratch
+
+$FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1
+$FSSUM_PROG -A -f -w $send_files_dir/2.fssum \
+   -x

Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-26 Thread Darrick J. Wong

On Mon, Jun 26, 2017 at 01:58:32PM -0400, jlay...@redhat.com wrote:
> On Mon, 2017-06-26 at 08:22 -0700, Darrick J. Wong wrote:
> > On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote:
> > > Just check and advance the data errseq_t in struct file before
> > > before returning from fsync on normal files. Internal filemap_*
> > > callers are left as-is.
> > > 
> > > Signed-off-by: Jeff Layton 
> > > ---
> > >  fs/xfs/xfs_file.c | 15 +++
> > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index 5fb5a0958a14..bc3b1575e8db 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -134,7 +134,7 @@ xfs_file_fsync(
> > >   struct inode*inode = file->f_mapping-
> > > >host;
> > >   struct xfs_inode*ip = XFS_I(inode);
> > >   struct xfs_mount*mp = ip->i_mount;
> > > - int error = 0;
> > > + int error = 0, err2;
> > >   int log_flushed = 0;
> > >   xfs_lsn_t   lsn = 0;
> > >  
> > > @@ -142,10 +142,12 @@ xfs_file_fsync(
> > >  
> > >   error = filemap_write_and_wait_range(inode->i_mapping,
> > > start, end);
> > >   if (error)
> > > - return error;
> > > + goto out;
> > >  
> > > - if (XFS_FORCED_SHUTDOWN(mp))
> > > - return -EIO;
> > > + if (XFS_FORCED_SHUTDOWN(mp)) {
> > > + error = -EIO;
> > > + goto out;
> > > + }
> > >  
> > >   xfs_iflags_clear(ip, XFS_ITRUNCATED);
> > >  
> > > @@ -197,6 +199,11 @@ xfs_file_fsync(
> > >   mp->m_logdev_targp == mp->m_ddev_targp)
> > >   xfs_blkdev_issue_flush(mp->m_ddev_targp);
> > >  
> > > +out:
> > > + err2 = filemap_report_wb_err(file);
> > 
> > Could we have a comment here to remind anyone reading the code a year
> > from now that filemap_report_wb_err has side effects?  Pre-coffee me
> > was
> > wondering why we'd bother calling filemap_report_wb_err in the
> > XFS_FORCED_SHUTDOWN case, then remembered that it touches data
> > structures.
> > 
> > The first sentence of the commit message (really, the word 'advance')
> > added as a comment was adequate to remind me of the side effects.
> > 
> > Once that's added,
> > Reviewed-by: Darrick J. Wong 
> > 
> > --D
> > 
> 
> Yeah, definitely. I'm working on a respin of the series now to
> incorporate HCH's suggestion too. I'll add that in as well.
> 
> Maybe I should rename that function to file_check_and_advance_wb_err()
> ? It would be good to make it clear that it does advance the errseq_t
> cursor.

Seems like a good idea.

--D

> 
> > > + if (!error)
> > > + error = err2;
> > > +
> > >   return error;
> > >  }
> > >  
> > > -- 
> > > 2.13.0
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> > > xfs" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio

2017-06-26 Thread Liu Bo

On Mon, Jun 26, 2017 at 08:09:57PM +0800, Ming Lei wrote:
> Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
> mentioned that for dio the submitted bio may be fast cloned, we
> can't access the bvec table directly for a cloned bio, so use
> bio_get_first_bvec() to retrieve the 1st bvec.
>

Looks good to me.

Reviewed-by: Liu Bo 

-liubo
> Cc: Chris Mason 
> Cc: Josef Bacik 
> Cc: David Sterba 
> Cc: linux-btrfs@vger.kernel.org
> Cc: Liu Bo 
> Signed-off-by: Ming Lei 
> ---
>  fs/btrfs/inode.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 06dea7c89bbd..4ab02b34f029 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct 
> bio *failed_bio,
>   int read_mode = 0;
>   int segs;
>   int ret;
> + struct bio_vec bvec;
>  
>   BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
>  
> @@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct 
> bio *failed_bio,
>   }
>  
>   segs = bio_segments(failed_bio);
> + bio_get_first_bvec(failed_bio, );
>   if (segs > 1 ||
> - (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
> + (bvec.bv_len > btrfs_inode_sectorsize(inode)))
>   read_mode |= REQ_FAILFAST_DEV;
>  
>   isector = start - btrfs_io_bio(failed_bio)->logical;
> -- 
> 2.9.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent

2017-06-26 Thread Liu Bo

On Mon, Jun 26, 2017 at 04:09:53PM +0200, David Sterba wrote:
> On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote:
> > From: Liu Bo 

Ah, my From was broken again.

> > 
> > %search_start is calculated in a wrong way, and if %ins is a cross-stripe
> >  one, it'll search the same block group forever.
> 
> That's a bit terse description, so please check if my understanding is right:
> search_start advances by at least one stripe len, but the math would be wrong
> as using bg_offset would not move us to the next stripe. 
> bg_cache->key.objectid
> is the full length so this will reach the next stripe and will not loop 
> forever.

Yes, it's correct, the code's logic is like, now that the returned %ins is a
cross-stripe one, it then calculates a BTRFS_STRIPE_LEN aligned one as the new
%search_start and see if there is any free block matching %search_start.  The
current code is using a wrong offset, the offset really should be the start
position of a block group.

> 
> Do you happen to have a test for that?

Unfortunately it's not a test with vanilla progs.

I found this when mkfs.btrfs with a 12K nodesize, but now kernel has a
power_of_2 limitation for nodesize and progs code is using a weird IS_ALIGNED()
which has the same effect with power_of_2(), mkfs.btrfs -n 12K is not allowed.
I changed IS_ALIGNED() to (blocksize % nodesize != 0) and got the above loop.

> 
> > Signed-off-by: Liu Bo 
> > ---
> >  extent-tree.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/extent-tree.c b/extent-tree.c
> > index b12ee29..5e09274 100644
> > --- a/extent-tree.c
> > +++ b/extent-tree.c
> > @@ -2614,8 +2614,9 @@ check_failed:
> > goto no_bg_cache;
> > bg_offset = ins->objectid - bg_cache->key.objectid;
> >  
> > -   search_start = round_up(bg_offset + num_bytes,
> > -   BTRFS_STRIPE_LEN) + bg_offset;
> > +   search_start = round_up(
> > +   bg_offset + num_bytes, BTRFS_STRIPE_LEN) +
> > +   bg_cache->key.object;
> 
> extent-tree.c: In function ‘find_free_extent’:
> extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named 
> ‘object’; did you mean ‘objectid’?
>  bg_cache->key.object;
>   ^

Ouch, that's right, it's %objectid.

I'll send a updated one, thanks for the comments.

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-26 Thread jlayton

On Mon, 2017-06-26 at 08:22 -0700, Darrick J. Wong wrote:
> On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote:
> > Just check and advance the data errseq_t in struct file before
> > before returning from fsync on normal files. Internal filemap_*
> > callers are left as-is.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/xfs/xfs_file.c | 15 +++
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 5fb5a0958a14..bc3b1575e8db 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -134,7 +134,7 @@ xfs_file_fsync(
> > struct inode*inode = file->f_mapping-
> > >host;
> > struct xfs_inode*ip = XFS_I(inode);
> > struct xfs_mount*mp = ip->i_mount;
> > -   int error = 0;
> > +   int error = 0, err2;
> > int log_flushed = 0;
> > xfs_lsn_t   lsn = 0;
> >  
> > @@ -142,10 +142,12 @@ xfs_file_fsync(
> >  
> > error = filemap_write_and_wait_range(inode->i_mapping,
> > start, end);
> > if (error)
> > -   return error;
> > +   goto out;
> >  
> > -   if (XFS_FORCED_SHUTDOWN(mp))
> > -   return -EIO;
> > +   if (XFS_FORCED_SHUTDOWN(mp)) {
> > +   error = -EIO;
> > +   goto out;
> > +   }
> >  
> > xfs_iflags_clear(ip, XFS_ITRUNCATED);
> >  
> > @@ -197,6 +199,11 @@ xfs_file_fsync(
> > mp->m_logdev_targp == mp->m_ddev_targp)
> > xfs_blkdev_issue_flush(mp->m_ddev_targp);
> >  
> > +out:
> > +   err2 = filemap_report_wb_err(file);
> 
> Could we have a comment here to remind anyone reading the code a year
> from now that filemap_report_wb_err has side effects?  Pre-coffee me
> was
> wondering why we'd bother calling filemap_report_wb_err in the
> XFS_FORCED_SHUTDOWN case, then remembered that it touches data
> structures.
> 
> The first sentence of the commit message (really, the word 'advance')
> added as a comment was adequate to remind me of the side effects.
> 
> Once that's added,
> Reviewed-by: Darrick J. Wong 
> 
> --D
> 

Yeah, definitely. I'm working on a respin of the series now to
incorporate HCH's suggestion too. I'll add that in as well.

Maybe I should rename that function to file_check_and_advance_wb_err()
? It would be good to make it clear that it does advance the errseq_t
cursor.

> > +   if (!error)
> > +   error = err2;
> > +
> > return error;
> >  }
> >  
> > -- 
> > 2.13.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > xfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/13] btrfs: convert prelimary reference tracking to use rbtrees

2017-06-26 Thread Jeff Mahoney

On 6/20/17 12:06 PM, Edmund Nadolski wrote:
> It's been known for a while that the use of multiple lists
> that are periodically merged was an algorithmic problem within
> btrfs.  There are several workloads that don't complete in any
> reasonable amount of time (e.g. btrfs/130) and others that cause
> soft lockups.
> 
> The solution is to use a pair of rbtrees that do insertion merging
> for both indirect and direct refs, with the former converting
> refs into the latter.  The result is a btrfs/130 workload that
> used to take several hours now takes about half of that. This
> runtime still isn't acceptable and a future patch will address that
> by moving the rbtrees higher in the stack so the lookups can be
> shared across multiple calls to find_parent_nodes.
> 
> Signed-off-by: Edmund Nadolski 
> Signed-off-by: Jeff Mahoney 
[...]

> @@ -504,37 +665,22 @@ static int resolve_indirect_refs(struct btrfs_fs_info 
> *fs_info,
>   return ret;
>  }
>  
> -static inline int ref_for_same_block(struct prelim_ref *ref1,
> -  struct prelim_ref *ref2)
> -{
> - if (ref1->level != ref2->level)
> - return 0;
> - if (ref1->root_id != ref2->root_id)
> - return 0;
> - if (ref1->key_for_search.type != ref2->key_for_search.type)
> - return 0;
> - if (ref1->key_for_search.objectid != ref2->key_for_search.objectid)
> - return 0;
> - if (ref1->key_for_search.offset != ref2->key_for_search.offset)
> - return 0;
> - if (ref1->parent != ref2->parent)
> - return 0;
> -
> - return 1;
> -}
> -
>  /*
>   * read tree blocks and add keys where required.
>   */
>  static int add_missing_keys(struct btrfs_fs_info *fs_info,
> - struct list_head *head)
> + struct preftrees *preftrees)
>  {
>   struct prelim_ref *ref;
>   struct extent_buffer *eb;
> + struct rb_node *node = rb_first(>indirect.root);
> +
> + while (node) {
> + ref = rb_entry(node, struct prelim_ref, rbnode);
> + node = rb_next(>rbnode);
> + if (WARN(ref->parent, "BUG: direct ref found in indirect tree"))
> + return -EINVAL;
>  
> - list_for_each_entry(ref, head, list) {
> - if (ref->parent)
> - continue;
>   if (ref->key_for_search.type)
>   continue;
>   BUG_ON(!ref->wanted_disk_byte);

Hi Ed -

I missed this in earlier review, but this can't work.  We're modifying
the ref in a way that the comparator will care about -- so the node
would move in the tree.

It's not a fatal flaw and, in fact, leaves us an opening to fix a
separate locking issue.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Btrfs progs pre-release 4.11.1-rc1

2017-06-26 Thread David Sterba

Hi,

a pre-release has been tagged.  A bugfix release.

Changes:
  * image: restoring from multiple devices
  * dev stats: make --check option work
  * check: fix false alert with extent hole on a NO_HOLE filesystem
  * check: lowmem mode, fix false alert in case of mixed inline and compressed
extent
  * convert: work with large filesystems (many TB)
  * docs updates
  * build: sync Android.mk with Makefile
  * tests:
* new tests
* fix 008 and 009, shell quotation mistake

ETA for 4.11.1 is in +4 days (2017-06-30).

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

David Sterba (7):
  btrfs-progs: docs: update formatting of btrfs-rescue
  btrfs-progs: docs: update formatting of btrfs-property
  btrfs-progs: docs: fix sentence for no-dump file attribute
  btrfs-progs: docs: update note about device deletion
  btrfs-progs: build: sync recent makefile changes to android.mk
  btrfs-progs: update CHANGES for v4.11.1
  Btrfs progs v4.11.1-rc1

Filipe Manana (2):
  btrfs-progs: Fix restoring image from multi devices fs into single device
  btrfs-progs: test for restoring multiple devices fs into a single device

Hans van Kranenburg (1):
  btrfs-progs: send operates on ro snapshots only

Kasijjuf (3):
  btrfs-progs: docs: Expand confusing abbreviation in documentation
  btrfs-progs: docs: Wrong section in ref to manpage
  btrfs-progs: docs: replace  with 

Lakshmipathi.G (3):
  btrfs-progs: Fix 'btrfs device stats --check' cli option
  btrfs-progs: convert: widen int types in convert context
  btrfs-progs: convert: Add larger device support

Lu Fengqi (1):
  btrfs-progs: lowmem check: Fix false alert about file extent interrupt

Qu Wenruo (2):
  btrfs-progs: check: Fix false alert about EXTENT_DATA that shouldn't be a 
hole
  btrfs-progs: tests: Add test case to check file hole extents with 
NO_HOLES flag

Tsutomu Itoh (1):
  btrfs-progs: tests: remove variable quotation from convert-tests

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] Btrfs for 4.13, part 1

2017-06-26 Thread Chris Mason


On 06/23/2017 11:16 AM, David Sterba wrote:

Hi,

this is the main batch for 4.13. There are some user visible changes, see
below. The core updates improve error handling (mostly related to bios), with
the usual incremental work on the GFP_NOFS (mis)use removal. All patches have
been in for-next for an extensive amount of time.

Thre will be followups but I want push the series (111 patches) forward. There
are also some updates to adjacent subsystems (writeback and blocklayer), so I
want to give some stable point for merging in the upcoming weeks.


Thanks Dave, I ran this (along with the updates we added) through a long 
stress and the usual xfstests.


For everyone else on the list, since I'm heading off to vacation until 
~July 9th, Dave is sending this off to Linus once the merge window starts.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] btrfs: Add zstd support

2017-06-26 Thread Nick Terrell

Thanks for the clarification! I will fix the divisions.

On 6/26/17, 5:12 AM, "David Sterba"  wrote:

On Sun, Jun 25, 2017 at 11:30:22PM +0200, Adam Borowski wrote:
> On Mon, Jun 26, 2017 at 03:03:17AM +0800, kbuild test robot wrote:
> > Hi Nick,
> > 
> > url:
https://github.com/0day-ci/linux/commits/Nick-Terrell/lib-Add-xxhash-module/20170625-214344
> > config: i386-allmodconfig (attached as .config)
> > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> > reproduce:
> > # save the attached .config to linux build tree
> > make ARCH=i386 
> > 
> > All errors (new ones prefixed by >>):
> > 
> > >> ERROR: "__udivdi3" [lib/zstd/zstd_compress.ko] undefined!
> >ERROR: "__udivdi3" [fs/ufs/ufs.ko] undefined!
> 
> Just to save you time to figure it out:
> for division when one or both arguments are longer than the architecture's
> word, gcc uses helper functions that are included when compiling in a 
hosted
> environment -- but not in freestanding.
> 
> Thus, you want do_div() instead of /; do check widths and signedness of
> arguments.

No do_div please, div_u64 or div64_u64.



N�r��yb�X��ǧv�^�)޺{.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH 03/11] btrfs: Don't clear SGID when inheriting ACLs

2017-06-26 Thread David Sterba

On Thu, Jun 22, 2017 at 03:31:07PM +0200, Jan Kara wrote:
> When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
> set, DIR1 is expected to have SGID bit set (and owning group equal to
> the owning group of 'DIR0'). However when 'DIR0' also has some default
> ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
> 'DIR1' to get cleared if user is not member of the owning group.
> 
> Fix the problem by moving posix_acl_update_mode() out of
> __btrfs_set_acl() into btrfs_set_acl(). That way the function will not be
> called when inheriting ACLs which is what we want as it prevents SGID
> bit clearing and the mode has been properly set by posix_acl_create()
> anyway.
> 
> Fixes: 073931017b49d9458aa351605b43a7e34598caef
> CC: sta...@vger.kernel.org
> CC: linux-btrfs@vger.kernel.org
> CC: David Sterba 
> Signed-off-by: Jan Kara 

Added to btrfs patch queue, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Next btrfs development cycle open - 4.14

2017-06-26 Thread David Sterba

Hi,

a friendly reminder of the timetable and what's expected at this phase.

4.11 - current
4.12 - upcoming, urgent regression fixes only
4.13 - development closed, pull request pending, fixes or regressions only
4.14 - development open, until 4.13-rc5

(https://btrfs.wiki.kernel.org/index.php/Developer%27s_FAQ#Development_schedule)

Besides the the usual cleanups and fixes, you can now start sending any patches
that could be more intrusive and would benefit from a longer period of
testing, or development revisions.

The base of the patches should be the last pull request, which is
'for-4.13-part1' in my k.org tree. Reviewed patches will be collected in a
branch that's usually named 'misc-next' in my devel git repos and is part of
the for-next at k.org git repo.

k.org: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
devel1: http://repo.or.cz/linux-2.6/btrfs-unstable.git
devel2: https://github.com/kdave/btrfs-devel

d.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] btrfs: Optimise layout of btrfs_block_group_cache

2017-06-26 Thread Nikolay Borisov



On 26.06.2017 17:42, Nikolay Borisov wrote:
> With this patch applied pahole stats look like:
> 
> /* size: 840, cachelines: 14, members: 40 */
> /* sum members: 833, holes: 1, sum holes: 7 */
> /* bit holes: 1, sum bit holes: 28 bits */
> /* last cacheline: 8 bytes */
> 
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/ctree.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index cdd3775e930b..bdd06bbeb9aa 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -586,6 +586,11 @@ struct btrfs_block_group_cache {
>   unsigned int iref:1;
>   unsigned int has_caching_ctl:1;
>   unsigned int removed:1;
> + /*
> +  * Does the block group need to be added to the free space tree?
> +  * Protected by free_space_lock.
> +  */
> + unsigned int needs_free_space:1;
Upon closer inspection of memory-barriers.txt I'm not confident in this
change. This puts fields protected by different locks in the same
bitfield which can lead to corrupted values.

>  
>   int disk_cache_state;
>  
> @@ -608,6 +613,8 @@ struct btrfs_block_group_cache {
>   /* usage count */
>   atomic_t count;
>  
> + atomic_t trimming

This one will likely eliminated 1 hole in the struct so I might end up
sending v2 of this patch.

> +
>   /* List of struct btrfs_free_clusters for this block group.
>* Today it will only have one thing on it, but that may change
>*/
> @@ -619,8 +626,6 @@ struct btrfs_block_group_cache {
>   /* For read-only block groups */
>   struct list_head ro_list;
>  
> - atomic_t trimming;
> -
>   /* For dirty block groups */
>   struct list_head dirty_list;
>   struct list_head io_list;
> @@ -651,11 +656,6 @@ struct btrfs_block_group_cache {
>   /* Lock for free space tree operations. */
>   struct mutex free_space_lock;
>  
> - /*
> -  * Does the block group need to be added to the free space tree?
> -  * Protected by free_space_lock.
> -  */
> - int needs_free_space;
>  
>   /* Record locked full stripes for RAID5/6 block group */
>   struct btrfs_full_stripe_locks_tree full_stripe_locks_root;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-26 Thread Darrick J. Wong

On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote:
> Just check and advance the data errseq_t in struct file before
> before returning from fsync on normal files. Internal filemap_*
> callers are left as-is.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/xfs/xfs_file.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5fb5a0958a14..bc3b1575e8db 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -134,7 +134,7 @@ xfs_file_fsync(
>   struct inode*inode = file->f_mapping->host;
>   struct xfs_inode*ip = XFS_I(inode);
>   struct xfs_mount*mp = ip->i_mount;
> - int error = 0;
> + int error = 0, err2;
>   int log_flushed = 0;
>   xfs_lsn_t   lsn = 0;
>  
> @@ -142,10 +142,12 @@ xfs_file_fsync(
>  
>   error = filemap_write_and_wait_range(inode->i_mapping, start, end);
>   if (error)
> - return error;
> + goto out;
>  
> - if (XFS_FORCED_SHUTDOWN(mp))
> - return -EIO;
> + if (XFS_FORCED_SHUTDOWN(mp)) {
> + error = -EIO;
> + goto out;
> + }
>  
>   xfs_iflags_clear(ip, XFS_ITRUNCATED);
>  
> @@ -197,6 +199,11 @@ xfs_file_fsync(
>   mp->m_logdev_targp == mp->m_ddev_targp)
>   xfs_blkdev_issue_flush(mp->m_ddev_targp);
>  
> +out:
> + err2 = filemap_report_wb_err(file);

Could we have a comment here to remind anyone reading the code a year
from now that filemap_report_wb_err has side effects?  Pre-coffee me was
wondering why we'd bother calling filemap_report_wb_err in the
XFS_FORCED_SHUTDOWN case, then remembered that it touches data
structures.

The first sentence of the commit message (really, the word 'advance')
added as a comment was adequate to remind me of the side effects.

Once that's added,
Reviewed-by: Darrick J. Wong 

--D

> + if (!error)
> + error = err2;
> +
>   return error;
>  }
>  
> -- 
> 2.13.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: lowmem check: Fix false alert about file extent interrupt

2017-06-26 Thread David Sterba

On Thu, Jun 22, 2017 at 04:12:56PM +0800, Lu Fengqi wrote:
> As Qu mentioned in this thread
> (https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression
> can cause regular extent to co-exist with inlined extent. This coexistence
> makes things confusing. Since it was permitted currently, so fix
> btrfsck to prevent a bunch of error logs that will make user feel
> panic.
> 
> When check file extent, record the extent_end of regular extent to check
> if there is a gap between the regular extents. Normally there is only one
> inlined extent, so the extent_end of inlined extent is useless. However,
> if regular extent can co-exist with inlined extent, the extent_end of
> inlined extent also need to record.
> 
> Reported-by: Marc MERLIN 
> Signed-off-by: Lu Fengqi 

Applied, thanks.

Do you have a test for that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: Optimise layout of btrfs_block_group_cache

2017-06-26 Thread Nikolay Borisov

With this patch applied pahole stats look like:

/* size: 840, cachelines: 14, members: 40 */
/* sum members: 833, holes: 1, sum holes: 7 */
/* bit holes: 1, sum bit holes: 28 bits */
/* last cacheline: 8 bytes */

No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index cdd3775e930b..bdd06bbeb9aa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -586,6 +586,11 @@ struct btrfs_block_group_cache {
unsigned int iref:1;
unsigned int has_caching_ctl:1;
unsigned int removed:1;
+   /*
+* Does the block group need to be added to the free space tree?
+* Protected by free_space_lock.
+*/
+   unsigned int needs_free_space:1;
 
int disk_cache_state;
 
@@ -608,6 +613,8 @@ struct btrfs_block_group_cache {
/* usage count */
atomic_t count;
 
+   atomic_t trimming;
+
/* List of struct btrfs_free_clusters for this block group.
 * Today it will only have one thing on it, but that may change
 */
@@ -619,8 +626,6 @@ struct btrfs_block_group_cache {
/* For read-only block groups */
struct list_head ro_list;
 
-   atomic_t trimming;
-
/* For dirty block groups */
struct list_head dirty_list;
struct list_head io_list;
@@ -651,11 +656,6 @@ struct btrfs_block_group_cache {
/* Lock for free space tree operations. */
struct mutex free_space_lock;
 
-   /*
-* Does the block group need to be added to the free space tree?
-* Protected by free_space_lock.
-*/
-   int needs_free_space;
 
/* Record locked full stripes for RAID5/6 block group */
struct btrfs_full_stripe_locks_tree full_stripe_locks_root;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: remove unused sectorsize member

2017-06-26 Thread Nikolay Borisov

The sectorsize member of btrfs_block_group_cache is unused. So remove it, this
reduces the number of holes in the struct.

With patch:
/* size: 856, cachelines: 14, members: 40 */
/* sum members: 837, holes: 4, sum holes: 19 */
/* bit holes: 1, sum bit holes: 29 bits */
/* last cacheline: 24 bytes */

Without patch:
/* size: 864, cachelines: 14, members: 41 */
/* sum members: 841, holes: 5, sum holes: 23 */
/* bit holes: 1, sum bit holes: 29 bits */
/* last cacheline: 32 bytes */

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 1 -
 fs/btrfs/extent-tree.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a75a23f9d68e..cdd3775e930b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -559,7 +559,6 @@ struct btrfs_block_group_cache {
u64 bytes_super;
u64 flags;
u64 cache_generation;
-   u32 sectorsize;
 
/*
 * If the free space extent count exceeds this number, convert the block
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a08a743a8e09..2a0d300c7d1a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9904,7 +9904,6 @@ btrfs_create_block_group_cache(struct btrfs_fs_info 
*fs_info,
cache->key.offset = size;
cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 
-   cache->sectorsize = fs_info->sectorsize;
cache->fs_info = fs_info;
cache->full_stripe_len = btrfs_full_stripe_len(fs_info,
   _info->mapping_tree,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs-progs: Fix false alert about EXTENT_DATA shouldn't be hole

2017-06-26 Thread David Sterba

On Mon, Jun 19, 2017 at 01:26:20PM +0200, Henk Slager wrote:
> On 16-06-17 03:43, Qu Wenruo wrote:
> > Since incompat feature NO_HOLES still allow us to have explicit hole
> > file extent, current check is too restrict and will cause false alert
> > like:
> >
> > root 5 EXTENT_DATA[257, 0] shouldn't be hole
> >
> > Fix it by removing the restrict hole file extent check.
> >
> > Reported-by: Henk Slager 
> > Signed-off-by: Qu Wenruo 
> > ---
> >  cmds-check.c | 6 +-
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/cmds-check.c b/cmds-check.c
> > index c052f66e..7bd57677 100644
> > --- a/cmds-check.c
> > +++ b/cmds-check.c
> > @@ -4841,11 +4841,7 @@ static int check_file_extent(struct btrfs_root 
> > *root, struct btrfs_key *fkey,
> > }
> >  
> > /* Check EXTENT_DATA hole */
> > -   if (no_holes && is_hole) {
> > -   err |= FILE_EXTENT_ERROR;
> > -   error("root %llu EXTENT_DATA[%llu %llu] shouldn't be hole",
> > - root->objectid, fkey->objectid, fkey->offset);
> > -   } else if (!no_holes && *end != fkey->offset) {
> > +   if (!no_holes && *end != fkey->offset) {
> > err |= FILE_EXTENT_ERROR;
> > error("root %llu EXTENT_DATA[%llu %llu] interrupt",
> >   root->objectid, fkey->objectid, fkey->offset);
> 
> 
> Thanks for the patch, I applied it on v4.11 btrfs-progs and re-ran the check:
> # btrfs check -p --readonly /dev/mapper/smr
> 
> on filesystem mentioned in:
> https://www.spinics.net/lists/linux-btrfs/msg66374.html
> 
> and now the "shouldn't be hole" errors don't show up anymore.
> 
> Tested-by: Henk Slager 

Thank you both, patch applied. I might also release a 4.11.x release
with this fix included.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking

2017-06-26 Thread Jeff Layton

On Sat, 2017-06-24 at 09:16 -0400, Jeff Layton wrote:
> On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote:
> > On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote:
> > > In order to query for errors with errseq_t, you need a previously-
> > > sampled point from which to check. When you call
> > > filemap_write_and_wait_range though you don't have a struct file and so
> > > no previously-sampled value.
> > 
> > So can we simply introduce variants of them that take a struct file?
> > That would be:
> > 
> >  a) less churn
> >  b) less code
> >  c) less chance to get data integrity wrong
> 
> Yeah, I had that thought after I sent the reply to you earlier.
> 
> The main reason I didn't do that before was that I had myself convinced
> that we needed to do the check_and_advance as late as possible in the
> fsync process, after the metadata had been written.
> 
> Now that I think about it more, I think you're probably correct. As long
> as we do the check and advance at some point after doing the
> write_and_wait, we're fine here and shouldn't violate exactly once
> semantics on the fsync return.

So I have a file_write_and_wait_range now that should DTRT for this
patch.

The bigger question is -- what about more complex filesystems like
ext4?  There are a couple of cases where we can return -EIO or -EROFS on
fsync before filemap_write_and_wait_range is ever called. Like this one
for instance:

if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
return -EIO;

...and the EXT4_MF_FS_ABORTED case.

Are those conditions ever recoverable, such that a later fsync could
succeed? IOW, could I do a remount or something such that the existing
fds are left open and become usable again? 

If so, then we really ought to advance the errseq_t in the file when we
catch those cases as well. If we have to do that, then it probably makes
sense to leave the ext4 patch as-is.
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: mkfs: Replace number with a macro

2017-06-26 Thread David Sterba

On Mon, Jun 26, 2017 at 06:18:29PM +0800, Gu Jinxiang wrote:
> For code maintainability and scalability,
> replace number with a macro of member blocks in btrfs_mkfs_config.
> 
> Signed-off-by: Gu Jinxiang 
> ---
> Changes since v1:
> Missing a using place. And modify it.
> 
>  mkfs/common.c | 4 ++--
>  mkfs/common.h | 5 -
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mkfs/common.c b/mkfs/common.c
> index e4785c5..0d79650 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   uuid_generate(chunk_tree_uuid);
>  
>   cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET;
> - for (i = 1; i < 7; i++) {
> + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) {

I'm not sure this is the best way to make the code more readable. "NR"
is the count of the roots and if it were used as " < NR" then it's clear
that we're iterating over a given number of items, but here the count is
also going to be used as an index to an array.

While this is correct, it's still necessary to keep in mind that some +1
or <= is needed while dealing with the blocks.

make_btrfs could use some heavy cleanup so we don't rely on the
hardcoded constants, in a similar way to reference_root_table so we can
use symbolic tree names.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: fix infinite loop in find_free_extent

2017-06-26 Thread David Sterba

On Fri, Jun 23, 2017 at 10:28:31PM -0600, Liu Bo wrote:
> From: Liu Bo 
> 
> %search_start is calculated in a wrong way, and if %ins is a cross-stripe
>  one, it'll search the same block group forever.

That's a bit terse description, so please check if my understanding is right:
search_start advances by at least one stripe len, but the math would be wrong
as using bg_offset would not move us to the next stripe. bg_cache->key.objectid
is the full length so this will reach the next stripe and will not loop forever.

Do you happen to have a test for that?

> Signed-off-by: Liu Bo 
> ---
>  extent-tree.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/extent-tree.c b/extent-tree.c
> index b12ee29..5e09274 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -2614,8 +2614,9 @@ check_failed:
>   goto no_bg_cache;
>   bg_offset = ins->objectid - bg_cache->key.objectid;
>  
> - search_start = round_up(bg_offset + num_bytes,
> - BTRFS_STRIPE_LEN) + bg_offset;
> + search_start = round_up(
> + bg_offset + num_bytes, BTRFS_STRIPE_LEN) +
> + bg_cache->key.object;

extent-tree.c: In function ‘find_free_extent’:
extent-tree.c:2617:18: error: ‘struct btrfs_key’ has no member named ‘object’; 
did you mean ‘objectid’?
 bg_cache->key.object;
  ^
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-26 Thread Carlos Maiolino

On Fri, Jun 16, 2017 at 03:34:26PM -0400, Jeff Layton wrote:
> Just check and advance the data errseq_t in struct file before
> before returning from fsync on normal files. Internal filemap_*
> callers are left as-is.
> 

Looks good.

Reviewed-by: Carlos Maiolino 

> Signed-off-by: Jeff Layton 
> ---
>  fs/xfs/xfs_file.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5fb5a0958a14..bc3b1575e8db 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -134,7 +134,7 @@ xfs_file_fsync(
>   struct inode*inode = file->f_mapping->host;
>   struct xfs_inode*ip = XFS_I(inode);
>   struct xfs_mount*mp = ip->i_mount;
> - int error = 0;
> + int error = 0, err2;
>   int log_flushed = 0;
>   xfs_lsn_t   lsn = 0;
>  
> @@ -142,10 +142,12 @@ xfs_file_fsync(
>  
>   error = filemap_write_and_wait_range(inode->i_mapping, start, end);
>   if (error)
> - return error;
> + goto out;
>  
> - if (XFS_FORCED_SHUTDOWN(mp))
> - return -EIO;
> + if (XFS_FORCED_SHUTDOWN(mp)) {
> + error = -EIO;
> + goto out;
> + }
>  
>   xfs_iflags_clear(ip, XFS_ITRUNCATED);
>  
> @@ -197,6 +199,11 @@ xfs_file_fsync(
>   mp->m_logdev_targp == mp->m_ddev_targp)
>   xfs_blkdev_issue_flush(mp->m_ddev_targp);
>  
> +out:
> + err2 = filemap_report_wb_err(file);
> + if (!error)
> + error = err2;
> +
>   return error;
>  }
>  
> -- 
> 2.13.0
> 

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] Btrfs for 4.13, part 1 (update 1)

2017-06-26 Thread David Sterba

On Fri, Jun 23, 2017 at 05:16:46PM +0200, David Sterba wrote:

Two more patches added to the branch

Chris Mason (1):
  btrfs: fix integer overflow in calc_reclaim_items_nr

David Sterba (1):
  btrfs: scrub: fix target device intialization while setting up scrub 
context

Updated branch and tag:


The following changes since commit 41f1830f5a7af77cf5c86359aba3cbd706687e52:

  Linux 4.12-rc6 (2017-06-19 22:19:37 +0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.13-part1

for you to fetch changes up to 8399f53f0c7450ab050b1b0ffee4e2c1ddd2a3e0:

  btrfs: fix integer overflow in calc_reclaim_items_nr (2017-06-26 15:33:42 
+0200)


Previous:

> 
> The following changes since commit 41f1830f5a7af77cf5c86359aba3cbd706687e52:
> 
>   Linux 4.12-rc6 (2017-06-19 22:19:37 +0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.13-part1
> 
> for you to fetch changes up to f3f000297be88b1b75fde5027d660a8d8a44de14:
> 
>   btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved 
> ranges (2017-06-21 20:56:14 +0200)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: scrub: fix target device intialization while setting up scrub context

2017-06-26 Thread David Sterba

The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a
helper but wrongly sets up the target device. Incidentally there's a
local variable with the same name as a parameter in the previous
function, so this got caught during runtime as crash in test btrfs/027.

Reported-by: Chris Mason 
Signed-off-by: David Sterba 
---
 fs/btrfs/scrub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 58a249cd5adc..738e784ba20d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -714,9 +714,9 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, 
int is_dev_replace)
mutex_init(>wr_lock);
sctx->wr_curr_bio = NULL;
if (is_dev_replace) {
-   WARN_ON(!dev->bdev);
+   WARN_ON(!fs_info->dev_replace.tgtdev);
sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO;
-   sctx->wr_tgtdev = dev;
+   sctx->wr_tgtdev = fs_info->dev_replace.tgtdev;
atomic_set(>flush_all_writes, 0);
}
 
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] btrfs-progs: btrfs-convert: Add larger device support

2017-06-26 Thread David Sterba

On Sat, Jun 03, 2017 at 03:27:45PM +0530, Lakshmipathi.G wrote:
> With larger file system (in this case its 22TB), ext2fs_open() returns
> EXT2_ET_CANT_USE_LEGACY_BITMAPS error message with ext2fs_read_block_bitmap().
> 
> To overcome this issue, (a) we need pass EXT2_FLAG_64BITS flag with 
> ext2fs_open.
> (b) use 64-bit functions like ext2fs_get_block_bitmap_range2,
> ext2fs_inode_data_blocks2,ext2fs_read_ext_attr2. (c) use 64bit types with
> btrfs_convert_context fields.
> 
> bug: https://bugzilla.kernel.org/show_bug.cgi?id=194795
> Signed-off-by: Lakshmipathi.G 

Applied, thanks.

> --- a/convert/common.h
> +++ b/convert/common.h
> @@ -30,10 +30,10 @@ struct btrfs_mkfs_config;
>  
>  struct btrfs_convert_context {
>   u32 blocksize;
> - u32 first_data_block;
> - u32 block_count;
> - u32 inodes_count;
> - u32 free_inodes_count;
> + u64 first_data_block;
> + u64 block_count;
> + u64 inodes_count;
> + u64 free_inodes_count;

I've split this change from the patch as it does not logically belong to
the same patch, altough the change is simple.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: Fix 'btrfs device stats --check' cli option

2017-06-26 Thread David Sterba

On Thu, Jun 22, 2017 at 01:27:53PM +0530, Lakshmipathi.G wrote:
> Bug 194961 - btrfs device stats --check  does not work
> https://bugzilla.kernel.org/show_bug.cgi?id=194961
> 
> Reported-by: Tomas Thiemel
> Signed-off-by: Lakshmipathi.G 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] btrfs: Add zstd support

2017-06-26 Thread David Sterba

On Sun, Jun 25, 2017 at 11:30:22PM +0200, Adam Borowski wrote:
> On Mon, Jun 26, 2017 at 03:03:17AM +0800, kbuild test robot wrote:
> > Hi Nick,
> > 
> > url:
> > https://github.com/0day-ci/linux/commits/Nick-Terrell/lib-Add-xxhash-module/20170625-214344
> > config: i386-allmodconfig (attached as .config)
> > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> > reproduce:
> > # save the attached .config to linux build tree
> > make ARCH=i386 
> > 
> > All errors (new ones prefixed by >>):
> > 
> > >> ERROR: "__udivdi3" [lib/zstd/zstd_compress.ko] undefined!
> >ERROR: "__udivdi3" [fs/ufs/ufs.ko] undefined!
> 
> Just to save you time to figure it out:
> for division when one or both arguments are longer than the architecture's
> word, gcc uses helper functions that are included when compiling in a hosted
> environment -- but not in freestanding.
> 
> Thus, you want do_div() instead of /; do check widths and signedness of
> arguments.

No do_div please, div_u64 or div64_u64.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 15/51] btrfs: comment on direct access bvec table

2017-06-26 Thread Ming Lei

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c |  4 
 fs/btrfs/inode.c   | 12 
 2 files changed, 16 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 2c0b7b57fcd5..5972f74354ca 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -541,6 +541,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode 
*inode, struct bio *bio,
 
/* we need the actual starting offset of this extent in the file */
read_lock(_tree->lock);
+   /*
+* It is still safe to retrieve the 1st page of the bio
+* in this way after supporting multipage bvec.
+*/
em = lookup_extent_mapping(em_tree,
   page_offset(bio->bi_io_vec->bv_page),
   PAGE_SIZE);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4ab02b34f029..7e725d84917b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8055,6 +8055,12 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
if (bio->bi_status)
goto end;
 
+   /*
+* WARNING:
+*
+* With multipage bvec, the following way of direct access to
+* bvec table is only safe if the bio includes single page.
+*/
ASSERT(bio->bi_vcnt == 1);
io_tree = _I(inode)->io_tree;
failure_tree = _I(inode)->io_failure_tree;
@@ -8146,6 +8152,12 @@ static void btrfs_retry_endio(struct bio *bio)
 
uptodate = 1;
 
+   /*
+* WARNING:
+*
+* With multipage bvec, the following way of direct access to
+* bvec table is only safe if the bio includes single page.
+*/
ASSERT(bio->bi_vcnt == 1);
ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(done->inode));
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio

2017-06-26 Thread Ming Lei

Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
mentioned that for dio the submitted bio may be fast cloned, we
can't access the bvec table directly for a cloned bio, so use
bio_get_first_bvec() to retrieve the 1st bvec.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Cc: Liu Bo 
Signed-off-by: Ming Lei 
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 06dea7c89bbd..4ab02b34f029 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
int read_mode = 0;
int segs;
int ret;
+   struct bio_vec bvec;
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
}
 
segs = bio_segments(failed_bio);
+   bio_get_first_bvec(failed_bio, );
if (segs > 1 ||
-   (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
+   (bvec.bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;
 
isector = start - btrfs_io_bio(failed_bio)->logical;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 13/51] btrfs: avoid access to .bi_vcnt directly

2017-06-26 Thread Ming Lei

BTRFS uses bio->bi_vcnt to figure out page numbers, this
way becomes not correct once we start to enable multipage
bvec.

So use bio_for_each_segment_all() to do that instead.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 21 +
 fs/btrfs/extent_io.h |  2 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0863164d97d2..5b453cada1ea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2258,7 +2258,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 
start, u64 end,
return 0;
 }
 
-int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
   struct io_failure_record *failrec, int failed_mirror)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2282,7 +2282,7 @@ int btrfs_check_repairable(struct inode *inode, struct 
bio *failed_bio,
 *  a) deliver good data to the caller
 *  b) correct the bad sectors on disk
 */
-   if (failed_bio->bi_vcnt > 1) {
+   if (failed_bio_pages > 1) {
/*
 * to fulfill b), we need to know the exact failing sectors, as
 * we don't want to rewrite any more than the failed ones. thus,
@@ -2355,6 +2355,17 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, 
struct bio *failed_bio,
return bio;
 }
 
+static unsigned int get_bio_pages(struct bio *bio)
+{
+   unsigned i;
+   struct bio_vec *bv;
+
+   bio_for_each_segment_all(bv, bio, i)
+   ;
+
+   return i;
+}
+
 /*
  * this is a generic handler for readpage errors (default
  * readpage_io_failed_hook). if other copies exist, read those and write back
@@ -2375,6 +2386,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
+   unsigned failed_bio_pages = get_bio_pages(failed_bio);
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -2382,13 +2394,14 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
if (ret)
return ret;
 
-   ret = btrfs_check_repairable(inode, failed_bio, failrec, failed_mirror);
+   ret = btrfs_check_repairable(inode, failed_bio_pages, failrec,
+failed_mirror);
if (!ret) {
free_io_failure(failure_tree, tree, failrec);
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if (failed_bio_pages > 1)
read_mode |= REQ_FAILFAST_DEV;
 
phy_offset >>= inode->i_sb->s_blocksize_bits;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d4942d94a16b..90681d1f0786 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -539,7 +539,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode 
*inode, u64 start,
u64 end);
 int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret);
-int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
   struct io_failure_record *failrec, int fail_mirror);
 struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio 
*failed_bio,
struct io_failure_record *failrec,
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 32/51] btrfs: use bvec_get_last_page to get bio's last page

2017-06-26 Thread Ming Lei

Preparing for supporting multipage bvec.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c | 5 -
 fs/btrfs/extent_io.c   | 8 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5972f74354ca..fdab5b821aa8 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -391,8 +391,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
 static u64 bio_end_offset(struct bio *bio)
 {
struct bio_vec *last = >bi_io_vec[bio->bi_vcnt - 1];
+   struct bio_vec bv;
 
-   return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
+   bvec_get_last_page(last, );
+
+   return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset;
 }
 
 static noinline int add_ra_bio_pages(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5b453cada1ea..7cc6c8a52e49 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2741,11 +2741,15 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
 {
blk_status_t ret = 0;
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
-   struct page *page = bvec->bv_page;
struct extent_io_tree *tree = bio->bi_private;
+   struct bio_vec bv;
+   struct page *page;
u64 start;
 
-   start = page_offset(page) + bvec->bv_offset;
+   bvec_get_last_page(bvec, );
+   page = bv.bv_page;
+
+   start = page_offset(page) + bv.bv_offset;
 
bio->bi_private = NULL;
bio_get(bio);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 48/51] fs/btrfs: convert to bio_for_each_segment_all_sp()

2017-06-26 Thread Ming Lei

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c |  3 ++-
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/extent_io.c   | 12 
 fs/btrfs/inode.c   |  6 --
 fs/btrfs/raid56.c  |  6 --
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index fdab5b821aa8..9d1693ecf468 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -147,12 +147,13 @@ static void end_compressed_bio_read(struct bio *bio)
} else {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
/*
 * we have verified the checksum already, set page
 * checked so the end_io handlers know about it
 */
-   bio_for_each_segment_all(bvec, cb->orig_bio, i)
+   bio_for_each_segment_all_sp(bvec, cb->orig_bio, i, bia)
SetPageChecked(bvec->bv_page);
 
bio_endio(cb->orig_bio);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f4f54d13db6d..e7efbaa3566c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -963,8 +963,9 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
struct bio_vec *bvec;
struct btrfs_root *root;
int i, ret = 0;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
root = BTRFS_I(bvec->bv_page->mapping->host)->root;
ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
if (ret)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7cc6c8a52e49..8e51452894ba 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2359,8 +2359,9 @@ static unsigned int get_bio_pages(struct bio *bio)
 {
unsigned i;
struct bio_vec *bv;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bv, bio, i)
+   bio_for_each_segment_all_sp(bv, bio, i, bia)
;
 
return i;
@@ -2468,8 +2469,9 @@ static void end_bio_extent_writepage(struct bio *bio)
u64 start;
u64 end;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2538,8 +2540,9 @@ static void end_bio_extent_readpage(struct bio *bio)
int mirror;
int ret;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -3695,8 +3698,9 @@ static void end_bio_extent_buffer_writepage(struct bio 
*bio)
struct bio_vec *bvec;
struct extent_buffer *eb;
int i, done;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 
eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7e725d84917b..61cc6d899ae5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8051,6 +8051,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
struct bio_vec *bvec;
struct extent_io_tree *io_tree, *failure_tree;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
goto end;
@@ -8067,7 +8068,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(inode));
 
done->uptodate = 1;
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all_sp(bvec, bio, i, bia)
clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
 io_tree, done->start, bvec->bv_page,
 btrfs_ino(BTRFS_I(inode)), 0);
@@ -8146,6 +8147,7 @@ static void btrfs_retry_endio(struct bio *bio)
int uptodate;
int ret;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
goto end;
@@ -8164,7 +8166,7 @@ static void btrfs_retry_endio(struct bio *bio)
io_tree = _I(inode)->io_tree;
failure_tree = _I(inode)->io_failure_tree;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
ret = __readpage_endio_check(inode, io_bio, i,

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-26 Thread Lu Fengqi


On 2017年06月24日 10:34, Marc MERLIN wrote:

On Fri, Jun 23, 2017 at 09:17:50AM -0700, Marc MERLIN wrote:

Thanks for looking at this.
I have applied your patch and I'm still re-running check in lowmem. It takes 
about 24H so I'll
post the full results when it's done.


Ok, here is the output of the check with btrfs-progs freshly synced from
git, including Lu's just added patch.

Obviously while I'm happy to give further debug info on why my filesystem is in 
that state and
while check --repair sees nothing to repair, suggestions on how to clean those 
warnings up, unless they are not going to affect filesystem operation, would be 
greatly appreciated :)

Thanks,
Marc


Thanks for the updated information. I'm sorry that the false alert make 
you feel nervous.




ERROR: root 3862 EXTENT_DATA[18170706 4096] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 16384] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 20480] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt
ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt
ERROR: errors found in fs roots


However, this looks like another problem. Could you dump this file tree 
by the following command?

# btrfs-debug-tree -t 3862  | grep -C 10 18170706

--
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/4] btrfs-progs: test: Add test image for lowmem mode referencer count mismatch false alert

2017-06-26 Thread Lu Fengqi

Add a image which can reproduce the extent item referencer count
mismatch false alert for lowmem mode.

Reported-by: Marc MERLIN 
Signed-off-by: Lu Fengqi 
---
 .../ref_count_mismatch_false_alert.img   | Bin 0 -> 4096 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 
tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img

diff --git 
a/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img 
b/tests/fsck-tests/020-extent-ref-cases/ref_count_mismatch_false_alert.img
new file mode 100644
index 
..85110a813b5d00cb35d23babc70d57510cae19b0
GIT binary patch
literal 4096
zcmeH}c|6oxAIE>Q7>1#-&$wtTF}mEwgG?nemXMMqOIb3S>}eQX#*#aBGA>1k8`yK>to-!TQWGJDmK5OLv(4g?)FJ-uijX$PT~x!ht)C
zY5P1L4Blq?7v2ronfPws75J{e|40Gapo2(xdkdYH52nfE^?_C$uX-3`wy=*GP3mJul9nzPWam{Dy8Jm@HHSh;$0D%?#(CLV)}m
z4=I4Q9U&_*-d#g>Vt=XgLnU`1QDX$?18gfj8HGr{0^h5Eif=UgaWL?esD!xAzoX
zSPYMeybi82L>KA))qZ@PiatGF(7mnZRyApaxavmP17%d2>L@Te*1{-nwKu1W&2h)^EItwUC=(6YpcF!`TVtS0q;I(2VAvqY@;m@EcAW6pd@gCaAJWrm=LDD!yE;pNYVL%W@MuW`_Z;`{%KeXc
z6yV)hwUaVf1fT=6WYY$FN?k9wupm%KCipH*$G#z5H%XvAY#nOQCTEn4S1`={@Gxr=
zt~K2#@Pm%O&0L?%-m|gq$XuO~V6@K7~nFvdMFs;1g@2c>FiCFkFwOPj
zJ@{D_aGpeL|`wvq=c0(z<`nlQb)97j(gm&49Ve)kpX|
z<;B$0X?oL^)zQAzKjaU{M{UWA^-ua~yn7ruy@aAhosds;P119BS?So?C%gCglEl8$
z!q`l|sGp3QukgjlxR6qk7a0or0O>(q(`Mo3{wGxwNaFIDpC3^<@r`gzm*=R
z(~pa6-#RgJ49%V|*lxGLiBV!o%B4?x1D`dX3BolS52|cfnP<$dmrPzBi`{wY&3LKa
z^*j)&6)GvZYgkKjutQN{?yTvtlK~nv(x3YF=`p%HO0B^Mu2$SrFW>L|YDR+WSau}8
zGLxS(4p|G=KszkW2)*cgKmTUKZ|M)8_OesX>$?2jYiGPp=H$$4dtF)N)MQ;+U13
z`)AJ5wzSW=pI@2Vf2Pllk4;|F*k0jSy-7Gq`O|pI`P!Oy(LcCB!h09GCJS$6atEv*
zH@lZkR9r7zb2jVUR4(!g6YmWL@*r;bEA4Np)V
zpnjftsG)f^C+HBxJwT~T^|!1&$(M4_6-Z3IBDMBkp472TXb=iM^wPP8G#nv
zC6qWNQ*1w+^Y~ZT^et>M?!T0nVBQK+{Krai+mmboyL+aKu%`-FD=Vc6
zte)<=7aT3k0(l$=dFZ!PfHy4!dSx2@inv~XylqQqdaT@+G9X;+f_43L!#JKCUAHh8

[PATCH v3 3/4] btrfs-progs: test: Add test image for lowmem mode file extent interrupt

2017-06-26 Thread Lu Fengqi

Add a image that the inlined extent coexist with the regular extent.

Reported-by: Marc MERLIN 
Signed-off-by: Lu Fengqi 
---
 .../020-extent-ref-cases/inline_regular_coexist.img  | Bin 0 -> 4096 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 
tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img

diff --git a/tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img 
b/tests/fsck-tests/020-extent-ref-cases/inline_regular_coexist.img
new file mode 100644
index 
..cf15cc14539f8759d18457d66b1f604244375b73
GIT binary patch
literal 4096
zcmeH|c{tSF9>6C`42^1Zkx@}1WS!6q##qwGGL|M~r)(kHSYPWHC2J^pTa+a+
zgX}{^WlR{mcQE#SnHTr|)xG!s``pL(InOzt^F7b=`r3)>B6fuyg{=A{1SgGNAfjf+gf>x5^89>!q~RF!x}RwLn~yX+TG(=oZAg)P-G(R|4qS8iYoH+gt!ix{l(>=(QMHW^+WMrqIS$FhqR^
zfbkD7l>uHOiSZrDu;z(jG5G?>y+WZqmCSsGS2W;$)2awzS)>fnx!?YUh!N+0a7XrS
zxg*MZ;siAC>#x>OsgMX|Um$u?-|
za?RDRdoY8~Rkm+476QWsekcwa*#s4s)sp%t!*Z5b(XQF_(!P8NFyY4tb?NgA`C0kM
zi|n1{oEo@qv4G(Hl0O)tGx^u8MG(S<3Q=(3U{#r*1}Gno{#|3=v;U{K2hnUj!k9qklkr}MG2B_!>}
zhLRkiu|j^F#6pdhFOp|y{==K|h=tKPBLIM>p(!>SS1N(xG1DN}grn*^m`)$Y~R*_WFs4{dsFJYQWI-zIc>
zz&30AJ_R15-0IjY;%jaqFK*H*w&(qKdZ$-9ua%PfXWu78(0$hF#i#1cvb>BUwIrl}
z+Z!!3G2xvyHIgzYhgs+v+X?xR{ff#Qy+ZFkKfYHgAAt2}zNUF!an=%#UOn66JckuO

[PATCH v3 1/4] btrfs-progs: lowmem check: Fix false alert about file extent interrupt

2017-06-26 Thread Lu Fengqi

As Qu mentioned in this thread
(https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression
can cause regular extent to co-exist with inlined extent. This coexistence
makes things confusing. Since it was permitted currently, so fix
btrfsck to prevent a bunch of error logs that will make user feel
panic.

When check file extent, record the extent_end of regular extent to check
if there is a gap between the regular extents. Normally there is only one
inlined extent, so the extent_end of inlined extent is useless. However,
if regular extent can co-exist with inlined extent, the extent_end of
inlined extent also need to record.

Reported-by: Marc MERLIN 
Signed-off-by: Lu Fengqi 
---

Changlog:
v2: Just fix reported-by
v3: Output verbose information when file extent interrupt

 cmds-check.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index c052f66e..70d2b7f2 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4782,6 +4782,7 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
extent_num_bytes, item_inline_len);
err |= FILE_EXTENT_ERROR;
}
+   *end += extent_num_bytes;
*size += extent_num_bytes;
return err;
}
@@ -4847,8 +4848,8 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
  root->objectid, fkey->objectid, fkey->offset);
} else if (!no_holes && *end != fkey->offset) {
err |= FILE_EXTENT_ERROR;
-   error("root %llu EXTENT_DATA[%llu %llu] interrupt",
- root->objectid, fkey->objectid, fkey->offset);
+   error("root %llu EXTENT_DATA[%llu %llu] interrupt, should start 
at %llu",
+ root->objectid, fkey->objectid, fkey->offset, *end);
}
 
*end += extent_num_bytes;
-- 
2.13.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/4] btrfs-progs: lowmem check: Fix false alert about referencer count mismatch

2017-06-26 Thread Lu Fengqi

The normal back reference counting doesn't care about the extent referred
by the extent data in the shared leaf. The check_extent_data_backref
function need to skip the leaf that owner mismatch with the root_id.

Reported-by: Marc MERLIN 
Signed-off-by: Lu Fengqi 
---
 cmds-check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 70d2b7f2..f42968cd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10692,7 +10692,8 @@ static int check_extent_data_backref(struct 
btrfs_fs_info *fs_info,
leaf = path.nodes[0];
slot = path.slots[0];
 
-   if (slot >= btrfs_header_nritems(leaf))
+   if (slot >= btrfs_header_nritems(leaf) ||
+   btrfs_header_owner(leaf) != root_id)
goto next;
btrfs_item_key_to_cpu(leaf, , slot);
if (key.objectid != objectid || key.type != 
BTRFS_EXTENT_DATA_KEY)
-- 
2.13.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs-progs: mkfs: Replace number with a macro

2017-06-26 Thread Gu Jinxiang

For code maintainability and scalability,
replace number with a macro of member blocks in btrfs_mkfs_config.

Signed-off-by: Gu Jinxiang 
---
Changes since v1:
Missing a using place. And modify it.

 mkfs/common.c | 4 ++--
 mkfs/common.h | 5 -
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index e4785c5..0d79650 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
uuid_generate(chunk_tree_uuid);
 
cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET;
-   for (i = 1; i < 7; i++) {
+   for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) {
cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 +
cfg->nodesize * i;
}
@@ -210,7 +210,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
cfg->nodesize - sizeof(struct btrfs_header));
nritems = 0;
itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize);
-   for (i = 1; i < 7; i++) {
+   for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) {
item_size = sizeof(struct btrfs_extent_item);
if (!skinny_metadata)
item_size += sizeof(struct btrfs_tree_block_info);
diff --git a/mkfs/common.h b/mkfs/common.h
index 666a75b..e23e79b 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -28,6 +28,9 @@
 #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M
 #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G
 
+/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */
+#define BTRFS_MKFS_ROOTS_NR 6
+
 struct btrfs_mkfs_config {
/* Label of the new filesystem */
const char *label;
@@ -43,7 +46,7 @@ struct btrfs_mkfs_config {
/* Output fields, set during creation */
 
/* Logical addresses of superblock [0] and other tree roots */
-   u64 blocks[8];
+   u64 blocks[BTRFS_MKFS_ROOTS_NR + 1];
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE];
char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE];
 
-- 
2.9.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: Replace number with a macro

2017-06-26 Thread Tsutomu Itoh

On 2017/06/26 17:23, Gu Jinxiang wrote:
> For code maintainability and scalability,
> replace number with a macro of member blocks in btrfs_mkfs_config.
> 
> Signed-off-by: Gu Jinxiang 
> ---
>  mkfs/common.c | 2 +-
>  mkfs/common.h | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mkfs/common.c b/mkfs/common.c
> index e4785c5..420671b 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   uuid_generate(chunk_tree_uuid);
>  
>   cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET;
> - for (i = 1; i < 7; i++) {
> + for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) {

If you change 7 to BTRFS_MKFS_ROOTS_NR, you also need to change the following 
code.

 213 for (i = 1; i < 7; i++) {

Thanks,
Tsutomu

>   cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 +
>   cfg->nodesize * i;
>   }
> diff --git a/mkfs/common.h b/mkfs/common.h
> index 666a75b..e23e79b 100644
> --- a/mkfs/common.h
> +++ b/mkfs/common.h
> @@ -28,6 +28,9 @@
>  #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M
>  #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G
>  
> +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */
> +#define BTRFS_MKFS_ROOTS_NR 6
> +
>  struct btrfs_mkfs_config {
>   /* Label of the new filesystem */
>   const char *label;
> @@ -43,7 +46,7 @@ struct btrfs_mkfs_config {
>   /* Output fields, set during creation */
>  
>   /* Logical addresses of superblock [0] and other tree roots */
> - u64 blocks[8];
> + u64 blocks[BTRFS_MKFS_ROOTS_NR + 1];
>   char fs_uuid[BTRFS_UUID_UNPARSED_SIZE];
>   char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE];
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: mkfs: Replace number with a macro

2017-06-26 Thread Gu Jinxiang

For code maintainability and scalability,
replace number with a macro of member blocks in btrfs_mkfs_config.

Signed-off-by: Gu Jinxiang 
---
 mkfs/common.c | 2 +-
 mkfs/common.h | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index e4785c5..420671b 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -94,7 +94,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
uuid_generate(chunk_tree_uuid);
 
cfg->blocks[0] = BTRFS_SUPER_INFO_OFFSET;
-   for (i = 1; i < 7; i++) {
+   for (i = 1; i <= BTRFS_MKFS_ROOTS_NR; i++) {
cfg->blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 +
cfg->nodesize * i;
}
diff --git a/mkfs/common.h b/mkfs/common.h
index 666a75b..e23e79b 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -28,6 +28,9 @@
 #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M
 #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G
 
+/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree */
+#define BTRFS_MKFS_ROOTS_NR 6
+
 struct btrfs_mkfs_config {
/* Label of the new filesystem */
const char *label;
@@ -43,7 +46,7 @@ struct btrfs_mkfs_config {
/* Output fields, set during creation */
 
/* Logical addresses of superblock [0] and other tree roots */
-   u64 blocks[8];
+   u64 blocks[BTRFS_MKFS_ROOTS_NR + 1];
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE];
char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE];
 
-- 
2.9.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 05/22] jbd2: don't clear and reset errors after waiting on writeback

2017-06-26 Thread Carlos Maiolino

On Fri, Jun 16, 2017 at 03:34:10PM -0400, Jeff Layton wrote:
> Resetting this flag is almost certainly racy, and will be problematic
> with some coming changes.
> 
> Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
> Have jbd2 call it instead of filemap_fdatawait and don't attempt to
> re-set the error flag if it fails.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/jbd2/commit.c   | 15 +++
>  include/linux/fs.h |  2 +-
>  mm/filemap.c   | 16 ++--
>  3 files changed, 18 insertions(+), 15 deletions(-)
> 
I'm not too experienced with jbd2 internals, but this patch is clear enough:

Reviewed-by: Carlos Maiolino 

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 04/22] buffer: set errors in mapping at the time that the error occurs

2017-06-26 Thread Carlos Maiolino

On Fri, Jun 16, 2017 at 03:34:09PM -0400, Jeff Layton wrote:
> I noticed on xfs that I could still sometimes get back an error on fsync
> on a fd that was opened after the error condition had been cleared.
> 
> The problem is that the buffer code sets the write_io_error flag and
> then later checks that flag to set the error in the mapping. That flag
> perisists for quite a while however. If the file is later opened with
> O_TRUNC, the buffers will then be invalidated and the mapping's error
> set such that a subsequent fsync will return error. I think this is
> incorrect, as there was no writeback between the open and fsync.
> 
> Add a new mark_buffer_write_io_error operation that sets the flag and
> the error in the mapping at the same time. Replace all calls to
> set_buffer_write_io_error with mark_buffer_write_io_error, and remove
> the places that check this flag in order to set the error in the
> mapping.
> 
> This sets the error in the mapping earlier, at the time that it's first
> detected.
> 
> Signed-off-by: Jeff Layton 
> Reviewed-by: Jan Kara 
> ---
>  fs/buffer.c | 20 +---
>  fs/gfs2/lops.c  |  2 +-
>  include/linux/buffer_head.h |  1 +
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 

Reviewed-by: Carlos Maiolino 

> diff --git a/fs/buffer.c b/fs/buffer.c
> index 7b4f4bfde91e..4d5d03b42e11 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -178,7 +178,7 @@ void end_buffer_write_sync(struct buffer_head *bh, int 
> uptodate)
>   set_buffer_uptodate(bh);
>   } else {
>   buffer_io_error(bh, ", lost sync page write");
> - set_buffer_write_io_error(bh);
> + mark_buffer_write_io_error(bh);
>   clear_buffer_uptodate(bh);
>   }
>   unlock_buffer(bh);
> @@ -352,8 +352,7 @@ void end_buffer_async_write(struct buffer_head *bh, int 
> uptodate)
>   set_buffer_uptodate(bh);
>   } else {
>   buffer_io_error(bh, ", lost async page write");
> - mapping_set_error(page->mapping, -EIO);
> - set_buffer_write_io_error(bh);
> + mark_buffer_write_io_error(bh);
>   clear_buffer_uptodate(bh);
>   SetPageError(page);
>   }
> @@ -481,8 +480,6 @@ static void __remove_assoc_queue(struct buffer_head *bh)
>  {
>   list_del_init(>b_assoc_buffers);
>   WARN_ON(!bh->b_assoc_map);
> - if (buffer_write_io_error(bh))
> - mapping_set_error(bh->b_assoc_map, -EIO);
>   bh->b_assoc_map = NULL;
>  }
>  
> @@ -1181,6 +1178,17 @@ void mark_buffer_dirty(struct buffer_head *bh)
>  }
>  EXPORT_SYMBOL(mark_buffer_dirty);
>  
> +void mark_buffer_write_io_error(struct buffer_head *bh)
> +{
> + set_buffer_write_io_error(bh);
> + /* FIXME: do we need to set this in both places? */
> + if (bh->b_page && bh->b_page->mapping)
> + mapping_set_error(bh->b_page->mapping, -EIO);
> + if (bh->b_assoc_map)
> + mapping_set_error(bh->b_assoc_map, -EIO);
> +}
> +EXPORT_SYMBOL(mark_buffer_write_io_error);
> +
>  /*
>   * Decrement a buffer_head's reference count.  If all buffers against a page
>   * have zero reference count, are clean and unlocked, and if the page is 
> clean
> @@ -3266,8 +3274,6 @@ drop_buffers(struct page *page, struct buffer_head 
> **buffers_to_free)
>  
>   bh = head;
>   do {
> - if (buffer_write_io_error(bh) && page->mapping)
> - mapping_set_error(page->mapping, -EIO);
>   if (buffer_busy(bh))
>   goto failed;
>   bh = bh->b_this_page;
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 885d36e7a29f..1a9c2c08c1a1 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -182,7 +182,7 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp, 
> struct bio_vec *bvec,
>   bh = bh->b_this_page;
>   do {
>   if (error)
> - set_buffer_write_io_error(bh);
> + mark_buffer_write_io_error(bh);
>   unlock_buffer(bh);
>   next = bh->b_this_page;
>   size -= bh->b_size;
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index bd029e52ef5e..e0abeba3ced7 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -149,6 +149,7 @@ void buffer_check_dirty_writeback(struct page *page,
>   */
>  
>  void mark_buffer_dirty(struct buffer_head *bh);
> +void mark_buffer_write_io_error(struct buffer_head *bh);
>  void init_buffer(struct buffer_head *, bh_end_io_t *, void *);
>  void touch_buffer(struct buffer_head *bh);
>  void set_bh_page(struct buffer_head *bh,
> -- 
> 2.13.0
> 

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 01/22] fs: remove call_fsync helper function

2017-06-26 Thread Carlos Maiolino

On Fri, Jun 16, 2017 at 03:34:06PM -0400, Jeff Layton wrote:
> Requested-by: Christoph Hellwig 
> Signed-off-by: Jeff Layton 
> ---
>  fs/sync.c  | 2 +-
>  include/linux/fs.h | 6 --
>  ipc/shm.c  | 2 +-
>  3 files changed, 2 insertions(+), 8 deletions(-)
> 
> 2.13.0
If it's worth to have one more reviewer, you can add:

Reviewed-by: Carlos Maiolino 

> 

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

59 matches

Mail list logo