Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Eric Sandeen
On 9/18/17 4:31 PM, Dave Chinner wrote:
> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
>> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
>>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
 Hi,

 A warning is triggered from:

 file fs/iomap.c in function iomap_dio_rw

 if (ret)
 goto out_free_dio;

 ret = invalidate_inode_pages2_range(mapping,
 start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>  WARN_ON_ONCE(ret);
 ret = 0;

 inode_dio_begin(inode);
>>>
>>> This is expected and an indication of a problematic workload - which
>>> may be triggered by a fuzzer.
>>
>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
>> the time running xfstests as well.
> 
> Because when a user reports a data corruption, the only evidence we
> have that they are running an app that does something stupid is this
> warning in their syslogs.  Tracepoints are not useful for replacing
> warnings about data corruption vectors being triggered.

Is the full WARN_ON spew really helpful to us, though?  Certainly
the user has no idea what it means, and will come away terrified
but none the wiser.

Would a more informative printk_once() still give us the evidence
without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
want/need the backtrace?

-Eric

> It needs to be on by default, bu tI'm sure we can wrap it with
> something like an xfs_alert_tag() type of construct so the tag can
> be set in /proc/fs/xfs/panic_mask to suppress it if testers so
> desire.
> 
> Cheers,
> 
> Dave.
> 


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote:
> On 9/18/17 4:31 PM, Dave Chinner wrote:
> > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> >>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>  Hi,
> 
>  A warning is triggered from:
> 
>  file fs/iomap.c in function iomap_dio_rw
> 
>  if (ret)
>  goto out_free_dio;
> 
>  ret = invalidate_inode_pages2_range(mapping,
>  start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
>  ret = 0;
> 
>  inode_dio_begin(inode);
> >>>
> >>> This is expected and an indication of a problematic workload - which
> >>> may be triggered by a fuzzer.
> >>
> >> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >> the time running xfstests as well.
> > 
> > Because when a user reports a data corruption, the only evidence we
> > have that they are running an app that does something stupid is this
> > warning in their syslogs.  Tracepoints are not useful for replacing
> > warnings about data corruption vectors being triggered.
> 
> Is the full WARN_ON spew really helpful to us, though?  Certainly
> the user has no idea what it means, and will come away terrified
> but none the wiser.
> 
> Would a more informative printk_once() still give us the evidence
> without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
> want/need the backtrace?

backtrace is actually useful - that's how I recently learnt that
splice now supports direct IO.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Darrick J. Wong
On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote:
> On 9/18/17 4:31 PM, Dave Chinner wrote:
> > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> >>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>  Hi,
> 
>  A warning is triggered from:
> 
>  file fs/iomap.c in function iomap_dio_rw
> 
>  if (ret)
>  goto out_free_dio;
> 
>  ret = invalidate_inode_pages2_range(mapping,
>  start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
>  ret = 0;
> 
>  inode_dio_begin(inode);
> >>>
> >>> This is expected and an indication of a problematic workload - which
> >>> may be triggered by a fuzzer.
> >>
> >> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >> the time running xfstests as well.
> > 
> > Because when a user reports a data corruption, the only evidence we
> > have that they are running an app that does something stupid is this
> > warning in their syslogs.  Tracepoints are not useful for replacing
> > warnings about data corruption vectors being triggered.
> 
> Is the full WARN_ON spew really helpful to us, though?  Certainly
> the user has no idea what it means, and will come away terrified
> but none the wiser.
> 
> Would a more informative printk_once() still give us the evidence
> without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
> want/need the backtrace?

Maybe we could state a little more directly what's going on:

if (err)
printk_once(KERN_INFO "Urk, collision detected between direct IO and 
page cache, YHL. HAND.\n"); ?

8-)

--D

> 
> -Eric
> 
> > It needs to be on by default, bu tI'm sure we can wrap it with
> > something like an xfs_alert_tag() type of construct so the tag can
> > be set in /proc/fs/xfs/panic_mask to suppress it if testers so
> > desire.
> > 
> > Cheers,
> > 
> > Dave.
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
On Mon, Sep 18, 2017 at 09:51:29AM -0600, Jens Axboe wrote:
> On 09/18/2017 09:43 AM, Al Viro wrote:
> > On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
> >> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >>> the time running xfstests as well.
> >>
> >> Dave insisted on it to decourage users/applications from mixing
> >> mmap and direct I/O.
> >>
> >> In many ways a tracepoint might be the better way to diagnose these.
> > 
> > sysctl suppressing those two, perhaps?
> 
> I'd rather just make it a trace point, but don't care too much.
> 
> The code doesn't even have a comment as to why that WARN_ON() is
> there or expected.

The big comment about how bad cache invalidation failures are is on
the second, post-io invocation of the page cache flush. That's the
failure that exposes the data coherency problem to userspace:

/*
 * Try again to invalidate clean pages which might have been cached by
 * non-direct readahead, or faulted in by get_user_pages() if the source
 * of the write was an mmap'ed region of the file we're writing.  Either
 * one is a pretty crazy thing to do, so we don't support it 100%.  If
 * this invalidation fails, tough, the write still worked...
 */
if (iov_iter_rw(iter) == WRITE) {
int err = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
WARN_ON_ONCE(err);
}

IOWs, the first warning is a "bad things might be about to
happen" warning, the second is "bad things have happened".

> Seems pretty sloppy to me, not a great way
> to "discourage" users to mix mmap/dio.

Again, it has nothing to do with "discouraging users" and everything
about post-bug report problem triage.

Yes, the first invalidation should also have a comment like the post
IO invalidation - the comment probably got dropped and not noticed
when the changeover from internal XFS code to generic iomap code was
made...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> > On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
> >> Hi,
> >>
> >> A warning is triggered from:
> >>
> >> file fs/iomap.c in function iomap_dio_rw
> >>
> >> if (ret)
> >> goto out_free_dio;
> >>
> >> ret = invalidate_inode_pages2_range(mapping,
> >> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>   WARN_ON_ONCE(ret);
> >> ret = 0;
> >>
> >> inode_dio_begin(inode);
> > 
> > This is expected and an indication of a problematic workload - which
> > may be triggered by a fuzzer.
> 
> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> the time running xfstests as well.

Because when a user reports a data corruption, the only evidence we
have that they are running an app that does something stupid is this
warning in their syslogs.  Tracepoints are not useful for replacing
warnings about data corruption vectors being triggered.

It needs to be on by default, bu tI'm sure we can wrap it with
something like an xfs_alert_tag() type of construct so the tag can
be set in /proc/fs/xfs/panic_mask to suppress it if testers so
desire.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Al Viro
On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> > If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> > the time running xfstests as well.
> 
> Dave insisted on it to decourage users/applications from mixing
> mmap and direct I/O.
> 
> In many ways a tracepoint might be the better way to diagnose these.

sysctl suppressing those two, perhaps?


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe
On 09/18/2017 09:43 AM, Al Viro wrote:
> On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
>> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
>>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
>>> the time running xfstests as well.
>>
>> Dave insisted on it to decourage users/applications from mixing
>> mmap and direct I/O.
>>
>> In many ways a tracepoint might be the better way to diagnose these.
> 
> sysctl suppressing those two, perhaps?

I'd rather just make it a trace point, but don't care too much.

The code doesn't even have a comment as to why that WARN_ON() is
there or expected. Seems pretty sloppy to me, not a great way
to "discourage" users to mix mmap/dio.

-- 
Jens Axboe



Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Christoph Hellwig
On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> the time running xfstests as well.

Dave insisted on it to decourage users/applications from mixing
mmap and direct I/O.

In many ways a tracepoint might be the better way to diagnose these.


Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe
On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>> Hi,
>>
>> A warning is triggered from:
>>
>> file fs/iomap.c in function iomap_dio_rw
>>
>> if (ret)
>> goto out_free_dio;
>>
>> ret = invalidate_inode_pages2_range(mapping,
>> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
  WARN_ON_ONCE(ret);
>> ret = 0;
>>
>> inode_dio_begin(inode);
> 
> This is expected and an indication of a problematic workload - which
> may be triggered by a fuzzer.

If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
the time running xfstests as well.

-- 
Jens Axboe



Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Christoph Hellwig
On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
> Hi,
> 
> A warning is triggered from:
> 
> file fs/iomap.c in function iomap_dio_rw
> 
> if (ret)
> goto out_free_dio;
> 
> ret = invalidate_inode_pages2_range(mapping,
> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
> ret = 0;
> 
> inode_dio_begin(inode);

This is expected and an indication of a problematic workload - which
may be triggered by a fuzzer.