Re: btrfs issue with mariadb incremental backup

2017-08-08 Thread Chris Murphy
On Tue, Aug 8, 2017 at 10:32 PM,   wrote:
> Hi btrfs support team,
>
> My name is siranee jaraswachirakul. I tested btrfs incremental send and 
> receive and
> I found something incorrect.
>
> I posted detail in the url
> http://www.linuxquestions.org/questions/showthread.php?p=5746238#post5746238
>
> Could you please help me to find the reason?

The command "btrfs sub snap mysql_201707210830 mysql" is unexpected
for two reasons:

1. This is a rw snapshot. It needs to be ro using -r flag to make it
ro, and only ro snapshots can be used with btrfs send receive.
2. The naming convention seems reversed. The subvolume you're
snapshotting should be first, and the resulting snapshot is second,
and it is the second one that you'll send. So I think really what
you'll end up wanting is:

Machine A
btrfs sub snap -r mysql mysql_20170721
btrfs send mysql_20170721

Machine B
btrfs sub snap mysql_20170721 mysql

That will make a rw snapshot from the ro snapshot.

The next problem with the post is that you say "incremental" but I do
not see the exact send receive command you're using. An incremental
send requires -p option.



Machine A
btrfs sub snap -r mysql mysql_20170721
btrfs send mysql_20170721

Machine B
btrfs sub snap mysql_20170721 mysql
Test mysql
btrfs sub del mysql

Machine A (next day)

Machine A
btrfs sub snap -r mysql mysql_20170722
btrfs send -p mysql_20170721 mysql_20170722

Machine B
btrfs sub snap mysql_20170722 mysql
Test mysql
btrfs sub del mysql

And so on. You must have the -p snapshot on both file systems for the
incremental send/receive to work.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs issue with mariadb incremental backup

2017-08-08 Thread siranee . ja
Hi btrfs support team,

My name is siranee jaraswachirakul. I tested btrfs incremental send and receive 
and
I found something incorrect.

I posted detail in the url
http://www.linuxquestions.org/questions/showthread.php?p=5746238#post5746238

Could you please help me to find the reason?


Best Regards,
Siranee Jaraswachirakul.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix out of bounds array access while reading extent buffer

2017-08-08 Thread Liu Bo
Hi Filipe,

On Tue, Aug 08, 2017 at 09:47:21AM +0100, Filipe Manana wrote:
> On Mon, Aug 7, 2017 at 8:39 PM, Liu Bo  wrote:
> > There is a cornel case that slip through the checkers in functions
> > reading extent buffer, ie.
> >
> > if (start < eb->len) and (start + len > eb->len),
> > then
> >
> > a) map_private_extent_buffer() returns immediately because
> > it's thinking the range spans across two pages,
> >
> > b) and the checkers in read_extent_buffer(), WARN_ON(start > eb->len)
> > and WARN_ON(start + len > eb->start + eb->len), both are OK in this
> > corner case, but it'd actually try to access the eb->pages out of
> > bounds because of (start + len > eb->len).
> >
> > The case is found by switching extent inline ref type from shared data
> > ref to non-shared data ref.
> >
> > This is adding proper checks in order to avoid invalid memory access,
> > ie. 'general protection', before it's too late.
> 
> Hi Bo,
> 
> I don't understand these 2 last paragraphs.
> How do you fix the invalid memory access? All the change does is make
> sure that attempts to read from invalid regions of an extent buffer
> result in a warning and returning an error code. Those paragraphs give
> the idea that the problem is that some caller is passing a wrong
> offset/length pair, however you aren't fixing any caller. can you
> clarify?
>

I see your doubt, that wrong offset/length pair comes from one of
btrfs_setget helpers, eg. btrfs_extent_data_ref_root.

The invalid memory access happens when the pointer it's using to
access fields in "struct btrfs_extent_data_ref" is actually a "struct
btrfs_shared_data_ref", this is caused by switching those types.

So the offset/length pair is correct from btrfs_setget helper's point
of view, but it's just using the wrong helper due to a wrong ref type
(in v2 this'll be added to the commit log to clarity).

> Also when you say " The case is found by switching extent inline ref
> type from shared data ref to non-shared data ref", it gives the idea
> this is a deterministic problem that always happens when doing that
> switch. If so, can we have a test case?
>

It is deterministic, but it depends on patching btrfs-corrupt-block
and last time I posted a corrupt-block related case, I was told that
tool is gonna change a lot and maybe get deleted in the near future,
so it's not yet suitable for fstests cases.

> Also missing the word "fault" after 'general protection'.
>

Yeah, there is a 'fault', the full text is
"general protection fault:  [#1] SMP KASAN",
will update.

Thanks,

-liubo

> Thanks.
> 
> >
> > Signed-off-by: Liu Bo 
> > ---
> >  fs/btrfs/extent_io.c | 22 ++
> >  1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 0aff9b2..d198e87 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -5416,13 +5416,19 @@ void read_extent_buffer(struct extent_buffer *eb, 
> > void *dstv,
> > char *dst = (char *)dstv;
> > size_t start_offset = eb->start & ((u64)PAGE_SIZE - 1);
> > unsigned long i = (start_offset + start) >> PAGE_SHIFT;
> > +   unsigned long num_pages = num_extent_pages(eb->start, eb->len);
> >
> > -   WARN_ON(start > eb->len);
> > -   WARN_ON(start + len > eb->start + eb->len);
> > +   if (start + len > eb->len) {
> > +   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> > wanted %lu %lu\n",
> > +eb->start, eb->len, start, len);
> > +   memset(dst, 0, len);
> > +   return;
> > +   }
> >
> > offset = (start_offset + start) & (PAGE_SIZE - 1);
> >
> > while (len > 0) {
> > +   ASSERT(i < num_pages);
> > page = eb->pages[i];
> >
> > cur = min(len, (PAGE_SIZE - offset));
> > @@ -5491,6 +5497,12 @@ int map_private_extent_buffer(struct extent_buffer 
> > *eb, unsigned long start,
> > unsigned long end_i = (start_offset + start + min_len - 1) >>
> > PAGE_SHIFT;
> >
> > +   if (start + min_len > eb->len) {
> > +   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> > wanted %lu %lu\n",
> > +  eb->start, eb->len, start, min_len);
> > +   return -EINVAL;
> > +   }
> > +
> > if (i != end_i)
> > return 1;
> >
> > @@ -5502,12 +5514,6 @@ int map_private_extent_buffer(struct extent_buffer 
> > *eb, unsigned long start,
> > *map_start = ((u64)i << PAGE_SHIFT) - start_offset;
> > }
> >
> > -   if (start + min_len > eb->len) {
> > -   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> > wanted %lu %lu\n",
> > -  eb->start, eb->len, start, min_len);
> > -   return -EINVAL;
> > -   }
> > -
> > p = eb->pages[i];
> > kaddr = page_address(p);
> > *map = 

RE: BTRfs Mailing List Subscribe

2017-08-08 Thread William Muriithi
Hi Jay,


Please subscribe me to the BTRfs mailing list.

You could actually do this yourself.  See the link below.


http://vger.kernel.org/vger-lists.html#linux-btrfs
https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list

Jay Jason Bartlett
PLM, G-RACK 12





Regards,
William


BTRfs Mailing List Subscribe

2017-08-08 Thread Jay Bartlett
Please subscribe me to the BTRfs mailing list.



Jay Jason Bartlett
PLM, G-RACK 12

G-Technology Product Line Marketing
M:760.429.3444
W: g-technology.com  E: jay.bartl...@wdc.com
G-Technology, a Western Digital Corporation Brand
 

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality 
Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or 
legally privileged information of WDC and/or its affiliates, and are intended 
solely for the use of the individual or entity to which they are addressed. If 
you are not the intended recipient, any disclosure, copying, distribution or 
any action taken or omitted to be taken in reliance on it, is prohibited. If 
you have received this e-mail in error, please notify the sender immediately 
and delete the e-mail in its entirety from your system.   



Re: how to benchmark schedulers

2017-08-08 Thread Bernhard Landauer

On 08.08.2017 15:19, Janos Toth F. wrote:


I think you should consider using Linux 4.12 which has bfq (bfq-mq)
for blk-mq. So, you don't need out-of-tree BFQ patches if you switch
to blk-mq (but now you are free to do so even if you have HDDs or SSDs
which benefit from software schedulers since you have some multi-queue
schedulers for them). Just make sure to enable blk-mq (has to be a
boot parameter or build-time choice) in order to gain access to
bfq-mq. And remember that bfq-mq has to be activated manually (the
build-time choice for a default scheduler is not valid for multi-queue
schedulers, you will default to "none" which is effectively the new
"no-op").
Note: there is only one BFQ in 4.12 and it's bfq-mq which runs under
the name of simply BFQ (not bfq-mq, I only used that name to make it
clear that BFQ in 4.12 is a multi-queue version of BFQ).

I always wondered if Btrfs makes any use of FUA if it's enabled for
compatible SATA devices (it's disabled by default because there are
some drives with faulty firmware).

I also wonder if RAID10 is any better (or actually worse?) for
metadata (and system) chunks than RAID1.

On Tue, Aug 8, 2017 at 1:59 PM, Bernhard Landauer  wrote:

Hello everyone

I am looking for a way to test different available schedulers with Manjaro's
bfq-patched kernels on sytems with both SSD and spinning drives. Since
phoronix-test-suite apparently is currently useless for this task due to its
bad config for bfq I am looking for alternatives. Do you have any
suggestions for me?
Thank you.

kind regards
Bernhard Landauer
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thank you for your reply, Janos!
In fact what I am trying to do is evaluating which kernel and which 
schedulers are delivering the best results for me and I would like to 
test/benchmark the differences.

So my question is about available tests I could run with different setups.
Please let me know if you know of any.

regards
Bernhard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to benchmark schedulers

2017-08-08 Thread Janos Toth F.
I think you should consider using Linux 4.12 which has bfq (bfq-mq)
for blk-mq. So, you don't need out-of-tree BFQ patches if you switch
to blk-mq (but now you are free to do so even if you have HDDs or SSDs
which benefit from software schedulers since you have some multi-queue
schedulers for them). Just make sure to enable blk-mq (has to be a
boot parameter or build-time choice) in order to gain access to
bfq-mq. And remember that bfq-mq has to be activated manually (the
build-time choice for a default scheduler is not valid for multi-queue
schedulers, you will default to "none" which is effectively the new
"no-op").
Note: there is only one BFQ in 4.12 and it's bfq-mq which runs under
the name of simply BFQ (not bfq-mq, I only used that name to make it
clear that BFQ in 4.12 is a multi-queue version of BFQ).

I always wondered if Btrfs makes any use of FUA if it's enabled for
compatible SATA devices (it's disabled by default because there are
some drives with faulty firmware).

I also wonder if RAID10 is any better (or actually worse?) for
metadata (and system) chunks than RAID1.

On Tue, Aug 8, 2017 at 1:59 PM, Bernhard Landauer  wrote:
> Hello everyone
>
> I am looking for a way to test different available schedulers with Manjaro's
> bfq-patched kernels on sytems with both SSD and spinning drives. Since
> phoronix-test-suite apparently is currently useless for this task due to its
> bad config for bfq I am looking for alternatives. Do you have any
> suggestions for me?
> Thank you.
>
> kind regards
> Bernhard Landauer
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to benchmark schedulers

2017-08-08 Thread Bernhard Landauer

Hello everyone

I am looking for a way to test different available schedulers with 
Manjaro's bfq-patched kernels on sytems with both SSD and spinning 
drives. Since phoronix-test-suite apparently is currently useless for 
this task due to its bad config for bfq I am looking for alternatives. 
Do you have any suggestions for me?

Thank you.

kind regards
Bernhard Landauer
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs umount hang

2017-08-08 Thread Angel Shtilianov
crash> bt -f 31625
PID: 31625  TASK: 88046a833400  CPU: 7   COMMAND: "btrfs-transacti"
wants to acquire struct extent_buffer 88000460aca0 lock, whose
lock_owner is 27574.

here is pid 27574:
PID: 27574  TASK: 88038b469a00  CPU: 4   COMMAND: "kworker/u32:9"
which is also is trying to acquire eb lock 8802598b6200, and here
the owner is 31696.

31696 is
PID: 31696  TASK: 88044b59ce00  CPU: 5   COMMAND: "umount"

So definitely here is a kind of deadlock.
umount holds the lock needed by the workers to complete and waits them
to complete.
Lockdep wouldn't complain about that.
I am still about to investigate what has previously triggered/disabled lockdep.
I have to obtain the log from the machine, but I need some time to get it.

Jeff, you were right.
Could you help demystifying how we ended up here?

Best regards,
Angel

On Mon, Aug 7, 2017 at 9:10 PM, Jeff Mahoney  wrote:
> On 8/7/17 1:19 PM, Jeff Mahoney wrote:
>> On 8/7/17 10:12 AM, Angel Shtilianov wrote:
>>> Hi there,
>>> I'm investigating sporadic hanging during btrfs umount. The FS is
>>> contained in a loop mounted file.
>>> I have no reproduction scenario and the issue may happen once a day or
>>> once a month. It is rare, but frustrating.
>>> I have a crashdump (the server has been manually crashed and collected
>>> a crashdump), so I could take look through the data structures.
>>> What happens is that umount is getting in D state and a the kernel
>>> complains about hung tasks. We are using kernel 4.4.y The actual back
>>> trace is from 4.4.70, but this happens with all the 4.4 kernels I've
>>> used (4.4.30 through 4.4.70).
>>> Tasks like:
>>> INFO: task kworker/u32:9:27574 blocked for more than 120 seconds.
>>> INFO: task kworker/u32:12:27575 blocked for more than 120 seconds.
>>> INFO: task btrfs-transacti:31625 blocked for more than 120 seconds.
>>> are getting blocked waiting for btrfs_tree_read_lock, which is owned
>>> by task umount:31696 (which is also blocked for more than 120 seconds)
>>> regarding the lock debug.
>>>
>>> umount is hung in "cache_block_group", see the '>' mark:
>>>while (cache->cached == BTRFS_CACHE_FAST) {
>>> struct btrfs_caching_control *ctl;
>>>
>>> ctl = cache->caching_ctl;
>>> atomic_inc(>count);
>>> prepare_to_wait(>wait, , TASK_UNINTERRUPTIBLE);
>>> spin_unlock(>lock);
>>>
schedule();
>>>
>>> finish_wait(>wait, );
>>> put_caching_control(ctl);
>>> spin_lock(>lock);
>>> }
>>>
>>> The complete backtraces could be found in the attached log.
>>>
>>> Do you have any ideas ?
>>
>> Hi Angel -
>>
>> In your log, it says lockdep is disabled.  What tripped it earlier?
>> Lockdep really should be catching locking deadlocks in situations like
>> this, if that's really the underlying cause.
>
> Actually, I'm not sure if lockdep would catch this one.  Here's my
> hypothesis:
>
> kworker/u32:9 is waiting on a read lock while reading the free space
> cache, which means it owns the cache->cached value and will issue the
> wakeup when it completes.
>
> umount is waiting on for the wakeup from kworker/u32:9 but is holding
> some tree locks in write mode.
>
> If kworker/u32:9 is waiting on the locks that umount holds, we have a
> deadlock.
>
> Can you dump the extent buffer that kworker/u32:9 is waiting on?  Part
> of that will contain the PID of the holder, and if matches umount, we
> found the cause.
>
> -Jeff
>
> --
> Jeff Mahoney
> SUSE Labs
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/49] btrfs: avoid access to .bi_vcnt directly

2017-08-08 Thread Ming Lei
BTRFS uses bio->bi_vcnt to figure out page numbers, this
way becomes not correct once we start to enable multipage
bvec.

So use bio_for_each_segment_all() to do that instead.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Acked-by: David Sterba 
Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 20 
 fs/btrfs/extent_io.h |  2 +-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0aff9b278c19..0e7367817b92 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2258,7 +2258,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 
start, u64 end,
return 0;
 }
 
-bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
   struct io_failure_record *failrec, int failed_mirror)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2282,7 +2282,7 @@ bool btrfs_check_repairable(struct inode *inode, struct 
bio *failed_bio,
 *  a) deliver good data to the caller
 *  b) correct the bad sectors on disk
 */
-   if (failed_bio->bi_vcnt > 1) {
+   if (failed_bio_pages > 1) {
/*
 * to fulfill b), we need to know the exact failing sectors, as
 * we don't want to rewrite any more than the failed ones. thus,
@@ -2355,6 +2355,17 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, 
struct bio *failed_bio,
return bio;
 }
 
+static unsigned int get_bio_pages(struct bio *bio)
+{
+   unsigned i;
+   struct bio_vec *bv;
+
+   bio_for_each_segment_all(bv, bio, i)
+   ;
+
+   return i;
+}
+
 /*
  * this is a generic handler for readpage errors (default
  * readpage_io_failed_hook). if other copies exist, read those and write back
@@ -2375,6 +2386,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
+   unsigned failed_bio_pages = get_bio_pages(failed_bio);
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -2382,13 +2394,13 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
if (ret)
return ret;
 
-   if (!btrfs_check_repairable(inode, failed_bio, failrec,
+   if (!btrfs_check_repairable(inode, failed_bio_pages, failrec,
failed_mirror)) {
free_io_failure(failure_tree, tree, failrec);
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if (failed_bio_pages > 1)
read_mode |= REQ_FAILFAST_DEV;
 
phy_offset >>= inode->i_sb->s_blocksize_bits;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4f030912f3ef..300ee10f39f2 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -539,7 +539,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode 
*inode, u64 start,
u64 end);
 int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret);
-bool btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
struct io_failure_record *failrec, int fail_mirror);
 struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio 
*failed_bio,
struct io_failure_record *failrec,
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/49] btrfs: avoid to access bvec table directly for a cloned bio

2017-08-08 Thread Ming Lei
Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
mentioned that for dio the submitted bio may be fast cloned, we
can't access the bvec table directly for a cloned bio, so use
bio_get_first_bvec() to retrieve the 1st bvec.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Cc: Liu Bo 
Reviewed-by: Liu Bo 
Acked: David Sterba 
Signed-off-by: Ming Lei 
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95c212037095..5cf320ee7ea0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
int read_mode = 0;
int segs;
int ret;
+   struct bio_vec bvec;
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
}
 
segs = bio_segments(failed_bio);
+   bio_get_first_bvec(failed_bio, );
if (segs > 1 ||
-   (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
+   (bvec.bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;
 
isector = start - btrfs_io_bio(failed_bio)->logical;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 46/49] fs/btrfs: convert to bio_for_each_segment_all_sp()

2017-08-08 Thread Filipe Manana
On Tue, Aug 8, 2017 at 9:45 AM, Ming Lei  wrote:
> Cc: Chris Mason 
> Cc: Josef Bacik 
> Cc: David Sterba 
> Cc: linux-btrfs@vger.kernel.org
> Signed-off-by: Ming Lei 

Can you please add some meaningful changelog? E.g., why is this
conversion needed.

> ---
>  fs/btrfs/compression.c |  3 ++-
>  fs/btrfs/disk-io.c |  3 ++-
>  fs/btrfs/extent_io.c   | 12 
>  fs/btrfs/inode.c   |  6 --
>  fs/btrfs/raid56.c  |  1 +
>  5 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 28746588f228..55f251a83d0b 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -147,13 +147,14 @@ static void end_compressed_bio_read(struct bio *bio)
> } else {
> int i;
> struct bio_vec *bvec;
> +   struct bvec_iter_all bia;
>
> /*
>  * we have verified the checksum already, set page
>  * checked so the end_io handlers know about it
>  */
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, cb->orig_bio, i)
> +   bio_for_each_segment_all_sp(bvec, cb->orig_bio, i, bia)
> SetPageChecked(bvec->bv_page);
>
> bio_endio(cb->orig_bio);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 080e2ebb8aa0..a9cd75e6383d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -963,9 +963,10 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
> struct bio_vec *bvec;
> struct btrfs_root *root;
> int i, ret = 0;
> +   struct bvec_iter_all bia;
>
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, bio, i) {
> +   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
> root = BTRFS_I(bvec->bv_page->mapping->host)->root;
> ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
> if (ret)
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c8f6a8657bf2..4de9cfd1c385 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2359,8 +2359,9 @@ static unsigned int get_bio_pages(struct bio *bio)
>  {
> unsigned i;
> struct bio_vec *bv;
> +   struct bvec_iter_all bia;
>
> -   bio_for_each_segment_all(bv, bio, i)
> +   bio_for_each_segment_all_sp(bv, bio, i, bia)
> ;
>
> return i;
> @@ -2463,9 +2464,10 @@ static void end_bio_extent_writepage(struct bio *bio)
> u64 start;
> u64 end;
> int i;
> +   struct bvec_iter_all bia;
>
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, bio, i) {
> +   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
> struct page *page = bvec->bv_page;
> struct inode *inode = page->mapping->host;
> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> @@ -2534,9 +2536,10 @@ static void end_bio_extent_readpage(struct bio *bio)
> int mirror;
> int ret;
> int i;
> +   struct bvec_iter_all bia;
>
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, bio, i) {
> +   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
> struct page *page = bvec->bv_page;
> struct inode *inode = page->mapping->host;
> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> @@ -3693,9 +3696,10 @@ static void end_bio_extent_buffer_writepage(struct bio 
> *bio)
> struct bio_vec *bvec;
> struct extent_buffer *eb;
> int i, done;
> +   struct bvec_iter_all bia;
>
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, bio, i) {
> +   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
> struct page *page = bvec->bv_page;
>
> eb = (struct extent_buffer *)page->private;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 084ed99dd308..eeb2ff662ec4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8047,6 +8047,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
> struct bio_vec *bvec;
> struct extent_io_tree *io_tree, *failure_tree;
> int i;
> +   struct bvec_iter_all bia;
>
> if (bio->bi_status)
> goto end;
> @@ -8064,7 +8065,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
>
> done->uptodate = 1;
> ASSERT(!bio_flagged(bio, BIO_CLONED));
> -   bio_for_each_segment_all(bvec, bio, i)
> +   bio_for_each_segment_all_sp(bvec, bio, i, bia)
> clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
>  io_tree, done->start, bvec->bv_page,
>   

[PATCH v3 30/49] btrfs: use bvec_get_last_page to get bio's last page

2017-08-08 Thread Ming Lei
Preparing for supporting multipage bvec.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c | 5 -
 fs/btrfs/extent_io.c   | 8 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f795d0a6d176..28746588f228 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -392,8 +392,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
 static u64 bio_end_offset(struct bio *bio)
 {
struct bio_vec *last = >bi_io_vec[bio->bi_vcnt - 1];
+   struct bio_vec bv;
 
-   return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
+   bvec_get_last_page(last, );
+
+   return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset;
 }
 
 static noinline int add_ra_bio_pages(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0e7367817b92..c8f6a8657bf2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2738,11 +2738,15 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
 {
blk_status_t ret = 0;
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
-   struct page *page = bvec->bv_page;
struct extent_io_tree *tree = bio->bi_private;
+   struct bio_vec bv;
+   struct page *page;
u64 start;
 
-   start = page_offset(page) + bvec->bv_offset;
+   bvec_get_last_page(bvec, );
+   page = bv.bv_page;
+
+   start = page_offset(page) + bv.bv_offset;
 
bio->bi_private = NULL;
bio_get(bio);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 46/49] fs/btrfs: convert to bio_for_each_segment_all_sp()

2017-08-08 Thread Ming Lei
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c |  3 ++-
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/extent_io.c   | 12 
 fs/btrfs/inode.c   |  6 --
 fs/btrfs/raid56.c  |  1 +
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 28746588f228..55f251a83d0b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -147,13 +147,14 @@ static void end_compressed_bio_read(struct bio *bio)
} else {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all bia;
 
/*
 * we have verified the checksum already, set page
 * checked so the end_io handlers know about it
 */
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, cb->orig_bio, i)
+   bio_for_each_segment_all_sp(bvec, cb->orig_bio, i, bia)
SetPageChecked(bvec->bv_page);
 
bio_endio(cb->orig_bio);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 080e2ebb8aa0..a9cd75e6383d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -963,9 +963,10 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
struct bio_vec *bvec;
struct btrfs_root *root;
int i, ret = 0;
+   struct bvec_iter_all bia;
 
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
root = BTRFS_I(bvec->bv_page->mapping->host)->root;
ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
if (ret)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c8f6a8657bf2..4de9cfd1c385 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2359,8 +2359,9 @@ static unsigned int get_bio_pages(struct bio *bio)
 {
unsigned i;
struct bio_vec *bv;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bv, bio, i)
+   bio_for_each_segment_all_sp(bv, bio, i, bia)
;
 
return i;
@@ -2463,9 +2464,10 @@ static void end_bio_extent_writepage(struct bio *bio)
u64 start;
u64 end;
int i;
+   struct bvec_iter_all bia;
 
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2534,9 +2536,10 @@ static void end_bio_extent_readpage(struct bio *bio)
int mirror;
int ret;
int i;
+   struct bvec_iter_all bia;
 
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -3693,9 +3696,10 @@ static void end_bio_extent_buffer_writepage(struct bio 
*bio)
struct bio_vec *bvec;
struct extent_buffer *eb;
int i, done;
+   struct bvec_iter_all bia;
 
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
 
eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 084ed99dd308..eeb2ff662ec4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8047,6 +8047,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
struct bio_vec *bvec;
struct extent_io_tree *io_tree, *failure_tree;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
goto end;
@@ -8064,7 +8065,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
 
done->uptodate = 1;
ASSERT(!bio_flagged(bio, BIO_CLONED));
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all_sp(bvec, bio, i, bia)
clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
 io_tree, done->start, bvec->bv_page,
 btrfs_ino(BTRFS_I(inode)), 0);
@@ -8143,6 +8144,7 @@ static void btrfs_retry_endio(struct bio *bio)
int uptodate;
int ret;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
goto end;
@@ -8162,7 +8164,7 @@ static void btrfs_retry_endio(struct bio *bio)
failure_tree = _I(inode)->io_failure_tree;
 

[PATCH v3 13/49] btrfs: comment on direct access bvec table

2017-08-08 Thread Ming Lei
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Acked: David Sterba 
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c |  4 
 fs/btrfs/inode.c   | 12 
 2 files changed, 16 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d2ef9ac2a630..f795d0a6d176 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -542,6 +542,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode 
*inode, struct bio *bio,
 
/* we need the actual starting offset of this extent in the file */
read_lock(_tree->lock);
+   /*
+* It is still safe to retrieve the 1st page of the bio
+* in this way after supporting multipage bvec.
+*/
em = lookup_extent_mapping(em_tree,
   page_offset(bio->bi_io_vec->bv_page),
   PAGE_SIZE);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5cf320ee7ea0..084ed99dd308 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8051,6 +8051,12 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
if (bio->bi_status)
goto end;
 
+   /*
+* WARNING:
+*
+* With multipage bvec, the following way of direct access to
+* bvec table is only safe if the bio includes single page.
+*/
ASSERT(bio->bi_vcnt == 1);
io_tree = _I(inode)->io_tree;
failure_tree = _I(inode)->io_failure_tree;
@@ -8143,6 +8149,12 @@ static void btrfs_retry_endio(struct bio *bio)
 
uptodate = 1;
 
+   /*
+* WARNING:
+*
+* With multipage bvec, the following way of direct access to
+* bvec table is only safe if the bio includes single page.
+*/
ASSERT(bio->bi_vcnt == 1);
ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(done->inode));
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix out of bounds array access while reading extent buffer

2017-08-08 Thread Filipe Manana
On Mon, Aug 7, 2017 at 8:39 PM, Liu Bo  wrote:
> There is a cornel case that slip through the checkers in functions
> reading extent buffer, ie.
>
> if (start < eb->len) and (start + len > eb->len),
> then
>
> a) map_private_extent_buffer() returns immediately because
> it's thinking the range spans across two pages,
>
> b) and the checkers in read_extent_buffer(), WARN_ON(start > eb->len)
> and WARN_ON(start + len > eb->start + eb->len), both are OK in this
> corner case, but it'd actually try to access the eb->pages out of
> bounds because of (start + len > eb->len).
>
> The case is found by switching extent inline ref type from shared data
> ref to non-shared data ref.
>
> This is adding proper checks in order to avoid invalid memory access,
> ie. 'general protection', before it's too late.

Hi Bo,

I don't understand these 2 last paragraphs.
How do you fix the invalid memory access? All the change does is make
sure that attempts to read from invalid regions of an extent buffer
result in a warning and returning an error code. Those paragraphs give
the idea that the problem is that some caller is passing a wrong
offset/length pair, however you aren't fixing any caller. can you
clarify?

Also when you say " The case is found by switching extent inline ref
type from shared data ref to non-shared data ref", it gives the idea
this is a deterministic problem that always happens when doing that
switch. If so, can we have a test case?

Also missing the word "fault" after 'general protection'.

Thanks.

>
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/extent_io.c | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0aff9b2..d198e87 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5416,13 +5416,19 @@ void read_extent_buffer(struct extent_buffer *eb, 
> void *dstv,
> char *dst = (char *)dstv;
> size_t start_offset = eb->start & ((u64)PAGE_SIZE - 1);
> unsigned long i = (start_offset + start) >> PAGE_SHIFT;
> +   unsigned long num_pages = num_extent_pages(eb->start, eb->len);
>
> -   WARN_ON(start > eb->len);
> -   WARN_ON(start + len > eb->start + eb->len);
> +   if (start + len > eb->len) {
> +   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> wanted %lu %lu\n",
> +eb->start, eb->len, start, len);
> +   memset(dst, 0, len);
> +   return;
> +   }
>
> offset = (start_offset + start) & (PAGE_SIZE - 1);
>
> while (len > 0) {
> +   ASSERT(i < num_pages);
> page = eb->pages[i];
>
> cur = min(len, (PAGE_SIZE - offset));
> @@ -5491,6 +5497,12 @@ int map_private_extent_buffer(struct extent_buffer 
> *eb, unsigned long start,
> unsigned long end_i = (start_offset + start + min_len - 1) >>
> PAGE_SHIFT;
>
> +   if (start + min_len > eb->len) {
> +   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> wanted %lu %lu\n",
> +  eb->start, eb->len, start, min_len);
> +   return -EINVAL;
> +   }
> +
> if (i != end_i)
> return 1;
>
> @@ -5502,12 +5514,6 @@ int map_private_extent_buffer(struct extent_buffer 
> *eb, unsigned long start,
> *map_start = ((u64)i << PAGE_SHIFT) - start_offset;
> }
>
> -   if (start + min_len > eb->len) {
> -   WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, 
> wanted %lu %lu\n",
> -  eb->start, eb->len, start, min_len);
> -   return -EINVAL;
> -   }
> -
> p = eb->pages[i];
> kaddr = page_address(p);
> *map = kaddr + offset;
> --
> 2.9.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html