Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-12-16 Thread Liu Bo
On Fri, Dec 02, 2016 at 12:25:19PM +, Filipe Manana wrote:
> On Fri, Dec 2, 2016 at 1:41 AM, Liu Bo  wrote:
> > On Thu, Nov 24, 2016 at 11:13:37AM +, Filipe Manana wrote:
> >> On Wed, Nov 23, 2016 at 9:22 PM, Liu Bo  wrote:
...
> >>
> >> The analysis is correct Bo.
> >> Originally to fix races between fsync and direct IO writes there was a
> >> solution [1, 2] that didn't involve adding a semaphore and relied on
> >> creating first the ordered extents and then the extent maps only in
> >> the direct IO write path (we do things in the reverse order everywhere
> >> else). It worked and was documented in comments but wasn't
> >> particularly elegant and Josef was not happy because of that, so then
> >> we added the semaphore and made direct IO write path create the extent
> >> maps and ordered extents in the same order as everywhere else [3].
> >>
> >> So here I can only see 2 simple solutions. Either revert [3] (which
> >> added the semaphore) or acquire the semaphore at a higher level in
> >> direct IO write path like this:
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 1f980ef..b2c277d 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -7237,7 +7237,6 @@ static struct extent_map
> >> *btrfs_create_dio_extent(struct inode *inode,
> >> struct extent_map *em = NULL;
> >> int ret;
> >>
> >> -   down_read(&BTRFS_I(inode)->dio_sem);
> >> if (type != BTRFS_ORDERED_NOCOW) {
> >> em = create_pinned_em(inode, start, len, orig_start,
> >>   block_start, block_len, 
> >> orig_block_len,
> >> @@ -7256,8 +7255,6 @@ static struct extent_map
> >> *btrfs_create_dio_extent(struct inode *inode,
> >> em = ERR_PTR(ret);
> >> }
> >>   out:
> >> -   up_read(&BTRFS_I(inode)->dio_sem);
> >> -
> >> return em;
> >>  }
> >>
> >> @@ -8715,11 +8712,14 @@ static ssize_t btrfs_direct_IO(struct kiocb
> >> *iocb, struct iov_iter *iter)
> >> wakeup = false;
> >> }
> >>
> >> +   if (iov_iter_rw(iter) == WRITE)
> >> +   down_read(&BTRFS_I(inode)->dio_sem);
> >> ret = __blockdev_direct_IO(iocb, inode,
> >>
> >> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
> >>iter, btrfs_get_blocks_direct, NULL,
> >>btrfs_submit_direct, flags);
> >> if (iov_iter_rw(iter) == WRITE) {
> >> +   up_read(&BTRFS_I(inode)->dio_sem);
> >> current->journal_info = NULL;
> >> if (ret < 0 && ret != -EIOCBQUEUED) {
> >> if (dio_data.reserve)
> >>
> >>
> >> Let me know what you think. Thanks.
> >
> > Hi Filipe,
> >
> > After going over again fs/direct-io.c, the AIO dio_complete is diffrent 
> > from the
> > 'Note that' part in your patch [1], what am I missing?
> >
> > -
> > static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
> > {
> > ...
> > if (!(dio->flags & DIO_SKIP_DIO_COUNT))
> > inode_dio_end(dio->inode);
> >
> > if (is_async) {
> > /*
> >  * generic_write_sync expects ki_pos to have been updated
> >  * already, but the submission path only does this for
> >  * synchronous I/O.
> >  */
> > dio->iocb->ki_pos += transferred;
> >
> > if (dio->op == REQ_OP_WRITE)
> > ret = generic_write_sync(dio->iocb,  transferred);
> > dio->iocb->ki_complete(dio->iocb, ret, 0);
> > }
> > ...
> > }
> 
> It's still the same as when I checked. The problem is that even after
> that call to inode_dio_end(), the inode->i_dio_count() is still
> non-zero (it's 1 iirc).
> I don't have any longer the debugging patch I used to find out that
> out (nor remember all the details), but I just tried again the
> approach of calling inode_dio_wait() in btrfs_sync_file() after
> locking the inode:
> 
> https://friendpaste.com/pRwJkgsFXv6HglftK1BqH
> 
> And the same deadlock ends up happening, which is trivial to reproduce
> with fstest generic/113:
> 
...
> I don't have the time to go now figure out all the details of the aio
> code path again as I'm on vacation. But it's pretty evident that that
> solution doesn't work unfortunately.

All right...I've got the deadlock under space pressure, i.e. run
generic/113 with '-o fragment=data'.

I was not able to reproduce the deadlock with generic/113 after applying
'inode_dio_wait' in fsync, but I agree on not using 'inode_dio_wait'
approach because dio reads also contribute to inode->i_dio_count.

With the above patch that acquires the semaphore at a higher level
in direct IO write path, the deadlock now has gone away.

So would you please make a single patch? (It's also a good candidate
for stabl

Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-12-02 Thread Filipe Manana
On Fri, Dec 2, 2016 at 1:41 AM, Liu Bo  wrote:
> On Thu, Nov 24, 2016 at 11:13:37AM +, Filipe Manana wrote:
>> On Wed, Nov 23, 2016 at 9:22 PM, Liu Bo  wrote:
>> > Hi,
>> >
>> > On Wed, Nov 23, 2016 at 06:21:35PM +0100, Stefan Priebe - Profihost AG 
>> > wrote:
>> >> Hi,
>> >>
>> >> sorry last mail was from the wrong box.
>> >>
>> >> Am 04.11.2016 um 20:20 schrieb Liu Bo:
>> >> > If we have
>> >> >
>> >> > |0--hole--4095||4096--preallocate--12287|
>> >> >
>> >> > instead of using preallocated space, a 8K direct write will just
>> >> > create a new 8K extent and it'll end up with
>> >> >
>> >> > |0--new extent--8191||8192--preallocate--12287|
>> >> >
>> >> > It's because we find a hole em and then go to create a new 8K
>> >> > extent directly without adjusting @len.
>> >>
>> >> after applying that one on top of my 4.4 btrfs branch (includes patches
>> >> up to 4.10 / next). i'm getting deadlocks in btrfs.
>> >
>> > This is really interesting, thanks for the quick testing.
>> >
>> > After going through the stacks listed below, I think the patch has
>> > exposed a bug around BTRFS_I(inode)->dio_sem:
>> >
>> > 1. Since fsync has acquired inode_lock(), the dio write must be
>> > an overwrite within EOF.
>> >
>> > 2. Lets say the inode size is 16k and it already has a preallocated extent 
>> > [4k, 8k],
>> > then we feed it with a dio write against [0k, 8k], with this patch
>> > applied, the write can be splitted into a new extent of [0, 4k] and a
>> > fill-write against the preallocated one [4k, 8k],
>> >
>> > 3.
>> > dio   fsync
>> > ->btrfs_direct_IO  
>> > btrfs_sync_file
>> >  ->do_direct_IO
>> >   ->get_more_blocks()
>> > ->inode_lock()
>> > ->btrfs_get_blocks_direct() # for [0, 8k]
>> > ->btrfs_log_inode()
>> >   ->btrfs_new_direct_extent()  
>> > ->btrfs_log_changed_extents()
>> > ->btrfs_create_dio_extent()
>> >   ->down_read(&BTRFS_I(inode)->dio_sem)
>> > # dio write is splitted and
>> > # em of [0, 4k] is inserted as well as
>> > # the ordered extent.
>> >   ->up_read(&BTRFS_I(inode)->dio_sem)
>> ># do_direct_IO tries to collect more pages
>> ># before sending them down, so [0, 4k] is not
>> ># yet submitted.
>> > 
>> >  
>> > ->down_write(&BTRFS_I(inode)->dio_sem)
>> >  # 
>> > found ordered extent of [0, 4k]
>> >  # 
>> > wait for [0, 4k] to finish
>> >   ->get_more_blocks()
>> > ->btrfs_get_blocks_direct() # for [4k, 8k]
>> >   ->btrfs_create_dio_extent()
>> > -> up_read(&BTRFS_I(inode)->dio_sem)
>> ># deadlock occurs
>> >
>> > 4. _Without_ this patch, we could hit the deadlock as well under space 
>> > pressure,
>> > i.e. if we request [0, 8k], but btrfs_reserve_extent() returns only [0, 
>> > 4k].
>> >
>> > (Filipe may correct me, cc'd Filipe.)
>>
>> The analysis is correct Bo.
>> Originally to fix races between fsync and direct IO writes there was a
>> solution [1, 2] that didn't involve adding a semaphore and relied on
>> creating first the ordered extents and then the extent maps only in
>> the direct IO write path (we do things in the reverse order everywhere
>> else). It worked and was documented in comments but wasn't
>> particularly elegant and Josef was not happy because of that, so then
>> we added the semaphore and made direct IO write path create the extent
>> maps and ordered extents in the same order as everywhere else [3].
>>
>> So here I can only see 2 simple solutions. Either revert [3] (which
>> added the semaphore) or acquire the semaphore at a higher level in
>> direct IO write path like this:
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 1f980ef..b2c277d 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -7237,7 +7237,6 @@ static struct extent_map
>> *btrfs_create_dio_extent(struct inode *inode,
>> struct extent_map *em = NULL;
>> int ret;
>>
>> -   down_read(&BTRFS_I(inode)->dio_sem);
>> if (type != BTRFS_ORDERED_NOCOW) {
>> em = create_pinned_em(inode, start, len, orig_start,
>>   block_start, block_len, orig_block_len,
>> @@ -7256,8 +7255,6 @@ static struct extent_map
>> *btrfs_create_dio_extent(struct inode *inode,
>> em = ERR_PTR(ret);
>> }
>>   out:
>> -   up_read(&BTRFS_I(inode)->dio_sem);
>> -
>> return em;
>>  }
>>
>> @@ -8715,11 +8712,14 @@ static ssize_t btrfs_direct_IO(struct kioc

Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-12-01 Thread Liu Bo
On Thu, Nov 24, 2016 at 11:13:37AM +, Filipe Manana wrote:
> On Wed, Nov 23, 2016 at 9:22 PM, Liu Bo  wrote:
> > Hi,
> >
> > On Wed, Nov 23, 2016 at 06:21:35PM +0100, Stefan Priebe - Profihost AG 
> > wrote:
> >> Hi,
> >>
> >> sorry last mail was from the wrong box.
> >>
> >> Am 04.11.2016 um 20:20 schrieb Liu Bo:
> >> > If we have
> >> >
> >> > |0--hole--4095||4096--preallocate--12287|
> >> >
> >> > instead of using preallocated space, a 8K direct write will just
> >> > create a new 8K extent and it'll end up with
> >> >
> >> > |0--new extent--8191||8192--preallocate--12287|
> >> >
> >> > It's because we find a hole em and then go to create a new 8K
> >> > extent directly without adjusting @len.
> >>
> >> after applying that one on top of my 4.4 btrfs branch (includes patches
> >> up to 4.10 / next). i'm getting deadlocks in btrfs.
> >
> > This is really interesting, thanks for the quick testing.
> >
> > After going through the stacks listed below, I think the patch has
> > exposed a bug around BTRFS_I(inode)->dio_sem:
> >
> > 1. Since fsync has acquired inode_lock(), the dio write must be
> > an overwrite within EOF.
> >
> > 2. Lets say the inode size is 16k and it already has a preallocated extent 
> > [4k, 8k],
> > then we feed it with a dio write against [0k, 8k], with this patch
> > applied, the write can be splitted into a new extent of [0, 4k] and a
> > fill-write against the preallocated one [4k, 8k],
> >
> > 3.
> > dio   fsync
> > ->btrfs_direct_IO  
> > btrfs_sync_file
> >  ->do_direct_IO
> >   ->get_more_blocks()
> > ->inode_lock()
> > ->btrfs_get_blocks_direct() # for [0, 8k]
> > ->btrfs_log_inode()
> >   ->btrfs_new_direct_extent()  
> > ->btrfs_log_changed_extents()
> > ->btrfs_create_dio_extent()
> >   ->down_read(&BTRFS_I(inode)->dio_sem)
> > # dio write is splitted and
> > # em of [0, 4k] is inserted as well as
> > # the ordered extent.
> >   ->up_read(&BTRFS_I(inode)->dio_sem)
> ># do_direct_IO tries to collect more pages
> ># before sending them down, so [0, 4k] is not
> ># yet submitted.
> > 
> >  
> > ->down_write(&BTRFS_I(inode)->dio_sem)
> >  # 
> > found ordered extent of [0, 4k]
> >  # 
> > wait for [0, 4k] to finish
> >   ->get_more_blocks()
> > ->btrfs_get_blocks_direct() # for [4k, 8k]
> >   ->btrfs_create_dio_extent()
> > -> up_read(&BTRFS_I(inode)->dio_sem)
> ># deadlock occurs
> >
> > 4. _Without_ this patch, we could hit the deadlock as well under space 
> > pressure,
> > i.e. if we request [0, 8k], but btrfs_reserve_extent() returns only [0, 4k].
> >
> > (Filipe may correct me, cc'd Filipe.)
> 
> The analysis is correct Bo.
> Originally to fix races between fsync and direct IO writes there was a
> solution [1, 2] that didn't involve adding a semaphore and relied on
> creating first the ordered extents and then the extent maps only in
> the direct IO write path (we do things in the reverse order everywhere
> else). It worked and was documented in comments but wasn't
> particularly elegant and Josef was not happy because of that, so then
> we added the semaphore and made direct IO write path create the extent
> maps and ordered extents in the same order as everywhere else [3].
> 
> So here I can only see 2 simple solutions. Either revert [3] (which
> added the semaphore) or acquire the semaphore at a higher level in
> direct IO write path like this:
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 1f980ef..b2c277d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7237,7 +7237,6 @@ static struct extent_map
> *btrfs_create_dio_extent(struct inode *inode,
> struct extent_map *em = NULL;
> int ret;
> 
> -   down_read(&BTRFS_I(inode)->dio_sem);
> if (type != BTRFS_ORDERED_NOCOW) {
> em = create_pinned_em(inode, start, len, orig_start,
>   block_start, block_len, orig_block_len,
> @@ -7256,8 +7255,6 @@ static struct extent_map
> *btrfs_create_dio_extent(struct inode *inode,
> em = ERR_PTR(ret);
> }
>   out:
> -   up_read(&BTRFS_I(inode)->dio_sem);
> -
> return em;
>  }
> 
> @@ -8715,11 +8712,14 @@ static ssize_t btrfs_direct_IO(struct kiocb
> *iocb, struct iov_iter *iter)
> wakeup = false;
> }
> 
> +   if (iov_iter_rw(iter) == WRITE)
> +   down_read(&BTRFS_I(inod

Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-11-24 Thread Filipe Manana
On Wed, Nov 23, 2016 at 9:22 PM, Liu Bo  wrote:
> Hi,
>
> On Wed, Nov 23, 2016 at 06:21:35PM +0100, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> sorry last mail was from the wrong box.
>>
>> Am 04.11.2016 um 20:20 schrieb Liu Bo:
>> > If we have
>> >
>> > |0--hole--4095||4096--preallocate--12287|
>> >
>> > instead of using preallocated space, a 8K direct write will just
>> > create a new 8K extent and it'll end up with
>> >
>> > |0--new extent--8191||8192--preallocate--12287|
>> >
>> > It's because we find a hole em and then go to create a new 8K
>> > extent directly without adjusting @len.
>>
>> after applying that one on top of my 4.4 btrfs branch (includes patches
>> up to 4.10 / next). i'm getting deadlocks in btrfs.
>
> This is really interesting, thanks for the quick testing.
>
> After going through the stacks listed below, I think the patch has
> exposed a bug around BTRFS_I(inode)->dio_sem:
>
> 1. Since fsync has acquired inode_lock(), the dio write must be
> an overwrite within EOF.
>
> 2. Lets say the inode size is 16k and it already has a preallocated extent 
> [4k, 8k],
> then we feed it with a dio write against [0k, 8k], with this patch
> applied, the write can be splitted into a new extent of [0, 4k] and a
> fill-write against the preallocated one [4k, 8k],
>
> 3.
> dio   fsync
> ->btrfs_direct_IO  
> btrfs_sync_file
>  ->do_direct_IO
>   ->get_more_blocks()
> ->inode_lock()
> ->btrfs_get_blocks_direct() # for [0, 8k]
> ->btrfs_log_inode()
>   ->btrfs_new_direct_extent()  
> ->btrfs_log_changed_extents()
> ->btrfs_create_dio_extent()
>   ->down_read(&BTRFS_I(inode)->dio_sem)
> # dio write is splitted and
> # em of [0, 4k] is inserted as well as
> # the ordered extent.
>   ->up_read(&BTRFS_I(inode)->dio_sem)
># do_direct_IO tries to collect more pages
># before sending them down, so [0, 4k] is not
># yet submitted.
> 
>  
> ->down_write(&BTRFS_I(inode)->dio_sem)
>  # 
> found ordered extent of [0, 4k]
>  # 
> wait for [0, 4k] to finish
>   ->get_more_blocks()
> ->btrfs_get_blocks_direct() # for [4k, 8k]
>   ->btrfs_create_dio_extent()
> -> up_read(&BTRFS_I(inode)->dio_sem)
># deadlock occurs
>
> 4. _Without_ this patch, we could hit the deadlock as well under space 
> pressure,
> i.e. if we request [0, 8k], but btrfs_reserve_extent() returns only [0, 4k].
>
> (Filipe may correct me, cc'd Filipe.)

The analysis is correct Bo.
Originally to fix races between fsync and direct IO writes there was a
solution [1, 2] that didn't involve adding a semaphore and relied on
creating first the ordered extents and then the extent maps only in
the direct IO write path (we do things in the reverse order everywhere
else). It worked and was documented in comments but wasn't
particularly elegant and Josef was not happy because of that, so then
we added the semaphore and made direct IO write path create the extent
maps and ordered extents in the same order as everywhere else [3].

So here I can only see 2 simple solutions. Either revert [3] (which
added the semaphore) or acquire the semaphore at a higher level in
direct IO write path like this:

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1f980ef..b2c277d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7237,7 +7237,6 @@ static struct extent_map
*btrfs_create_dio_extent(struct inode *inode,
struct extent_map *em = NULL;
int ret;

-   down_read(&BTRFS_I(inode)->dio_sem);
if (type != BTRFS_ORDERED_NOCOW) {
em = create_pinned_em(inode, start, len, orig_start,
  block_start, block_len, orig_block_len,
@@ -7256,8 +7255,6 @@ static struct extent_map
*btrfs_create_dio_extent(struct inode *inode,
em = ERR_PTR(ret);
}
  out:
-   up_read(&BTRFS_I(inode)->dio_sem);
-
return em;
 }

@@ -8715,11 +8712,14 @@ static ssize_t btrfs_direct_IO(struct kiocb
*iocb, struct iov_iter *iter)
wakeup = false;
}

+   if (iov_iter_rw(iter) == WRITE)
+   down_read(&BTRFS_I(inode)->dio_sem);
ret = __blockdev_direct_IO(iocb, inode,

BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
   iter, btrfs_get_blocks_direct, NULL,
   btrfs_submit_direct, flags);
if (iov_iter_rw(iter) == WRITE) {
+

Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-11-23 Thread Liu Bo
Hi,

On Wed, Nov 23, 2016 at 06:21:35PM +0100, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> sorry last mail was from the wrong box.
> 
> Am 04.11.2016 um 20:20 schrieb Liu Bo:
> > If we have
> > 
> > |0--hole--4095||4096--preallocate--12287|
> > 
> > instead of using preallocated space, a 8K direct write will just
> > create a new 8K extent and it'll end up with
> > 
> > |0--new extent--8191||8192--preallocate--12287|
> > 
> > It's because we find a hole em and then go to create a new 8K
> > extent directly without adjusting @len.
> 
> after applying that one on top of my 4.4 btrfs branch (includes patches
> up to 4.10 / next). i'm getting deadlocks in btrfs.

This is really interesting, thanks for the quick testing.

After going through the stacks listed below, I think the patch has
exposed a bug around BTRFS_I(inode)->dio_sem:

1. Since fsync has acquired inode_lock(), the dio write must be
an overwrite within EOF.

2. Lets say the inode size is 16k and it already has a preallocated extent [4k, 
8k],
then we feed it with a dio write against [0k, 8k], with this patch
applied, the write can be splitted into a new extent of [0, 4k] and a
fill-write against the preallocated one [4k, 8k],

3.
dio   fsync
->btrfs_direct_IO  
btrfs_sync_file
 ->do_direct_IO
  ->get_more_blocks()
->inode_lock()
->btrfs_get_blocks_direct() # for [0, 8k]
->btrfs_log_inode()
  ->btrfs_new_direct_extent()  
->btrfs_log_changed_extents()
->btrfs_create_dio_extent()
  ->down_read(&BTRFS_I(inode)->dio_sem)
# dio write is splitted and
# em of [0, 4k] is inserted as well as
# the ordered extent.   
  ->up_read(&BTRFS_I(inode)->dio_sem)
   # do_direct_IO tries to collect more pages
   # before sending them down, so [0, 4k] is not
   # yet submitted.

 
->down_write(&BTRFS_I(inode)->dio_sem)
 # 
found ordered extent of [0, 4k]
 # wait 
for [0, 4k] to finish
  ->get_more_blocks()
->btrfs_get_blocks_direct() # for [4k, 8k]
  ->btrfs_create_dio_extent()
-> up_read(&BTRFS_I(inode)->dio_sem)
   # deadlock occurs

4. _Without_ this patch, we could hit the deadlock as well under space pressure,
i.e. if we request [0, 8k], but btrfs_reserve_extent() returns only [0, 4k].

(Filipe may correct me, cc'd Filipe.)

Thanks,

-liubo

> 
> Traces here:
> INFO: task btrfs-transacti:604 blocked for more than 120 seconds.
>   Not tainted 4.4.34 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> btrfs-transacti D 8814e78cbe00 0   604  2 0x0008
>  8814e78cbe00 88017367a540 8814e2f88000 8814e78cc000
>  8814e78cbe38 88123616c510 8814e24c81f0 88153fb0a000
>  8814e78cbe18 816a8425 8814e63165a0 8814e78cbe88
> Call Trace:
>  [] schedule+0x35/0x80
>  [] btrfs_commit_transaction+0x275/0xa50 [btrfs]
>  [] transaction_kthread+0x1d6/0x200 [btrfs]
>  [] kthread+0xdb/0x100
>  [] ret_from_fork+0x3f/0x70
> DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
> 
> Leftover inexact backtrace:
> 
>  [] ? kthread_park+0x60/0x60
> INFO: task mysqld:1977 blocked for more than 120 seconds.
>   Not tainted 4.4.34 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> mysqld  D 88142ef1bcf8 0  1977  1 0x0008
>  88142ef1bcf8 81e0f500 8814dc2c4a80 88142ef1c000
>  8814e32ed298 8814e32ed2c0 88110aa9a000 8814e32ed000
>  88142ef1bd10 816a8425 8814e32ed000 88142ef1bd60
> Call Trace:
>  [] schedule+0x35/0x80
>  [] wait_for_writer+0xa2/0xb0 [btrfs]
>  [] btrfs_sync_log+0xe9/0xa00 [btrfs]
>  [] btrfs_sync_file+0x35f/0x3d0 [btrfs]
>  [] vfs_fsync_range+0x3d/0xb0
>  [] do_fsync+0x3d/0x70
>  [] SyS_fsync+0x10/0x20
>  [] entry_SYSCALL_64_fastpath+0x12/0x71
> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x71
> 
> Leftover inexact backtrace:
> 
> INFO: task mysqld:3249 blocked for more than 120 seconds.
>   Not tainted 4.4.34 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> mysqld  D 881475fdfa40 0  3249  1 0x0008
>  881475fdfa40 88017367ca80 8814433d2540 881475fe
>  88040da39ba0 0023 88040da39c20 00238000
>  881475fdfa58 816a8425 8000 881475fdfb18
> Call Trace:
>  [] schedule+0x35/0x80
>  []
> wait_or

Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-11-23 Thread Stefan Priebe - Profihost AG

Am 23.11.2016 um 19:23 schrieb Holger Hoffstätte:
> On 11/23/16 18:21, Stefan Priebe - Profihost AG wrote:
>> Am 04.11.2016 um 20:20 schrieb Liu Bo:
>>> If we have
>>>
>>> |0--hole--4095||4096--preallocate--12287|
>>>
>>> instead of using preallocated space, a 8K direct write will just
>>> create a new 8K extent and it'll end up with
>>>
>>> |0--new extent--8191||8192--preallocate--12287|
>>>
>>> It's because we find a hole em and then go to create a new 8K
>>> extent directly without adjusting @len.
>>
>> after applying that one on top of my 4.4 btrfs branch (includes patches
>> up to 4.10 / next). i'm getting deadlocks in btrfs.
> 
> *ctrl+f sectorsize* .. 
> 
> That's not surprising if you did what I suspect. If your tree is based
> on my - now really very retired - 4.4.x queue, then you are likely missing
> _all the other blocksize/sectorsize patches_ that came in from Chandra
> Seetharaman et al., which I _really_ carefully patched around, for many
> good reasons.

*arg* that makes sense. Still not easy to find out which ones to skip.
Yes that one is based on yours.

thanks,
Stefan

> 
> -h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent

2016-11-23 Thread Holger Hoffstätte
On 11/23/16 18:21, Stefan Priebe - Profihost AG wrote:
> Am 04.11.2016 um 20:20 schrieb Liu Bo:
>> If we have
>>
>> |0--hole--4095||4096--preallocate--12287|
>>
>> instead of using preallocated space, a 8K direct write will just
>> create a new 8K extent and it'll end up with
>>
>> |0--new extent--8191||8192--preallocate--12287|
>>
>> It's because we find a hole em and then go to create a new 8K
>> extent directly without adjusting @len.
> 
> after applying that one on top of my 4.4 btrfs branch (includes patches
> up to 4.10 / next). i'm getting deadlocks in btrfs.

*ctrl+f sectorsize* .. 

That's not surprising if you did what I suspect. If your tree is based
on my - now really very retired - 4.4.x queue, then you are likely missing
_all the other blocksize/sectorsize patches_ that came in from Chandra
Seetharaman et al., which I _really_ carefully patched around, for many
good reasons.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html