Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-11 Thread Dave Chinner
On Fri, May 11, 2018 at 10:59:53AM +0200, Dmitry Vyukov wrote:
> On Thu, May 10, 2018 at 1:22 AM, Dave Chinner  wrote:
> > On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> >> Does "xfstests fuzzing infrastructure" use coverage-guidance?
> >
> > It's guided manually to fuzz a substantial proportion of the fields
> > in the on-disk format that are susceptible to fuzzing bqased
> > attacks. It's not complete coverage yet, but it's getting better and
> > better, and we're finding more problems from it that random bit
> > based fuzzing has ever uncovered.
> >
> > Also, the xfstests fuzzing defeats the CRC protection now built into
> > the metadata, which means it can exercise all the new filesystem
> > features that random bit fuzzers cannot exercise. That's the problem
> > with fuzzers like syzbot - they can only usefully fuzz the legacy
> > filesystem format which doesn't have CRC validation, nor many of the
> > other protections that the current filesystem format has to detect
> > corruption. This will also allow us to test things like online
> > repair of fuzzed structures
> 
> syzkaller has 2 techniques to deal with checksums, if you are
> interested I can go into more detail.

You can if you want, but I'm betting it basically comes down to
teaching syzcaller about parts of the on-disk format, similar to
AFL. And, like AFL, I doubt any XFS developer has the time to
add such support to syzbot.

> > Given the results we're getting from our own fuzzers, I don't see
> > much point in (XFS developers) investing huge amounts of effort to
> > make some other fuzzer equivalent to what we already have. If
> > someone else starts fuzzing the current format (v5) XFS filesystems
> > and finding problems we haven't, then I'm going to be interested in
> > their fuzzing tools.  But (guided) random bit perturbation fuzzing
> > of legacy filesystem formats is really not that useful or
> > interesting to us right now.
> 
> Just asked.
> 
> Note that coverage-guidance does not necessary mean bit flipping.
> syzkaller combines coverage-guidance with grammar-awareness and other
> smartness.

Yup, I assumed that this would be the case - those sorts of
"directed fuzzing" techniques were pioneered by the Samba guys for
reverse engineering the SMB protocol used by MS servers all those
years ago. But at it's most basic level, it's still using bit
flipping techniques to perturb the input and provoke responses.

> Based on our experience with network testing, main advantage of
> syzkaller over just feeding blobs as network packets (even if these
> blobs are built in a very smart way) is the following. syzkaller can
> build complex interactions between syscalls, external inputs and
> blobs.

Yup, nothing new there - that's what every other filesystem fuzzer
infrastructure does, too.  The problem with this is that it doesn't
pin-point the actual operation that tripped over the on-disk
corruption. It's catching downstream symptoms of an unknown,
undetected on-disk format corruption. i.e. it's a poor substitute
for explicit testing of structure bounds and data relationships of a
known format.

That's the fundamental premise of fuzz testing - most software does
not have robust validation of it's inputs and so fuzzing those
inputs finds problems. We've moved on from the old "trust and don't
validate" model of filesystem structure architecture.  The on-disk
format is very well defined, it is constrained in most cases, and we
can validate most individual structures at runtime with relatively
little cost.

Hence the "structure bounds" exploits that fuzzers tend to exercise
are pretty much taken out of the picture, and that leaves us with
"data relationships" between structures as the main vector for
undetected corruptions. These are mostly detectable, and many are
correctable as the current on-disk format has a lot of redundant
information. So the space for fuzzers to detect problems is getting
smaller and smaller all the time.

IOWs, filesystem image fuzzers have their place, but if you want us
to take your fuzzing seriously then your fuzzer needs to understand
all the mechanisms we now use to detect corruptions to show us where
they are deficient. If your fuzzing doesn't expose flaws in our
current validation techniques, then it's really not useful to us.

> For example, handling of external network packets depend on if
> there is an open socket on that port, what setsockopts were called, if
> there is a pending receive, what flags were passed to that receive,
> were some data sent the other way, etc. For filesystems that would be
> various filesystem syscalls executed against the mounted image,
> concurrent umount, rebind, switch to read-only mode, etc.
> But maybe xfstests do this too, I don't know. Do they?

Generally there is no need to do this because we know exactly what
syscalls will trigger access and/or modification to on-disk
structures. Access to the on-disk structures triggers the built 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-11 Thread Dave Chinner
On Fri, May 11, 2018 at 10:59:53AM +0200, Dmitry Vyukov wrote:
> On Thu, May 10, 2018 at 1:22 AM, Dave Chinner  wrote:
> > On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> >> Does "xfstests fuzzing infrastructure" use coverage-guidance?
> >
> > It's guided manually to fuzz a substantial proportion of the fields
> > in the on-disk format that are susceptible to fuzzing bqased
> > attacks. It's not complete coverage yet, but it's getting better and
> > better, and we're finding more problems from it that random bit
> > based fuzzing has ever uncovered.
> >
> > Also, the xfstests fuzzing defeats the CRC protection now built into
> > the metadata, which means it can exercise all the new filesystem
> > features that random bit fuzzers cannot exercise. That's the problem
> > with fuzzers like syzbot - they can only usefully fuzz the legacy
> > filesystem format which doesn't have CRC validation, nor many of the
> > other protections that the current filesystem format has to detect
> > corruption. This will also allow us to test things like online
> > repair of fuzzed structures
> 
> syzkaller has 2 techniques to deal with checksums, if you are
> interested I can go into more detail.

You can if you want, but I'm betting it basically comes down to
teaching syzcaller about parts of the on-disk format, similar to
AFL. And, like AFL, I doubt any XFS developer has the time to
add such support to syzbot.

> > Given the results we're getting from our own fuzzers, I don't see
> > much point in (XFS developers) investing huge amounts of effort to
> > make some other fuzzer equivalent to what we already have. If
> > someone else starts fuzzing the current format (v5) XFS filesystems
> > and finding problems we haven't, then I'm going to be interested in
> > their fuzzing tools.  But (guided) random bit perturbation fuzzing
> > of legacy filesystem formats is really not that useful or
> > interesting to us right now.
> 
> Just asked.
> 
> Note that coverage-guidance does not necessary mean bit flipping.
> syzkaller combines coverage-guidance with grammar-awareness and other
> smartness.

Yup, I assumed that this would be the case - those sorts of
"directed fuzzing" techniques were pioneered by the Samba guys for
reverse engineering the SMB protocol used by MS servers all those
years ago. But at it's most basic level, it's still using bit
flipping techniques to perturb the input and provoke responses.

> Based on our experience with network testing, main advantage of
> syzkaller over just feeding blobs as network packets (even if these
> blobs are built in a very smart way) is the following. syzkaller can
> build complex interactions between syscalls, external inputs and
> blobs.

Yup, nothing new there - that's what every other filesystem fuzzer
infrastructure does, too.  The problem with this is that it doesn't
pin-point the actual operation that tripped over the on-disk
corruption. It's catching downstream symptoms of an unknown,
undetected on-disk format corruption. i.e. it's a poor substitute
for explicit testing of structure bounds and data relationships of a
known format.

That's the fundamental premise of fuzz testing - most software does
not have robust validation of it's inputs and so fuzzing those
inputs finds problems. We've moved on from the old "trust and don't
validate" model of filesystem structure architecture.  The on-disk
format is very well defined, it is constrained in most cases, and we
can validate most individual structures at runtime with relatively
little cost.

Hence the "structure bounds" exploits that fuzzers tend to exercise
are pretty much taken out of the picture, and that leaves us with
"data relationships" between structures as the main vector for
undetected corruptions. These are mostly detectable, and many are
correctable as the current on-disk format has a lot of redundant
information. So the space for fuzzers to detect problems is getting
smaller and smaller all the time.

IOWs, filesystem image fuzzers have their place, but if you want us
to take your fuzzing seriously then your fuzzer needs to understand
all the mechanisms we now use to detect corruptions to show us where
they are deficient. If your fuzzing doesn't expose flaws in our
current validation techniques, then it's really not useful to us.

> For example, handling of external network packets depend on if
> there is an open socket on that port, what setsockopts were called, if
> there is a pending receive, what flags were passed to that receive,
> were some data sent the other way, etc. For filesystems that would be
> various filesystem syscalls executed against the mounted image,
> concurrent umount, rebind, switch to read-only mode, etc.
> But maybe xfstests do this too, I don't know. Do they?

Generally there is no need to do this because we know exactly what
syscalls will trigger access and/or modification to on-disk
structures. Access to the on-disk structures triggers the built in
verifier 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-11 Thread Dmitry Vyukov
On Thu, May 10, 2018 at 1:22 AM, Dave Chinner  wrote:
> On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
>> Does "xfstests fuzzing infrastructure" use coverage-guidance?
>
> It's guided manually to fuzz a substantial proportion of the fields
> in the on-disk format that are susceptible to fuzzing bqased
> attacks. It's not complete coverage yet, but it's getting better and
> better, and we're finding more problems from it that random bit
> based fuzzing has ever uncovered.
>
> Also, the xfstests fuzzing defeats the CRC protection now built into
> the metadata, which means it can exercise all the new filesystem
> features that random bit fuzzers cannot exercise. That's the problem
> with fuzzers like syzbot - they can only usefully fuzz the legacy
> filesystem format which doesn't have CRC validation, nor many of the
> other protections that the current filesystem format has to detect
> corruption. This will also allow us to test things like online
> repair of fuzzed structures

syzkaller has 2 techniques to deal with checksums, if you are
interested I can go into more detail.

> Random bit perturbation filesystem image fuzzing was state of the
> art ~10 years ago.  They were made redundant by filesystems like
> XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
> filesystem formats are essentially unfixable, and it's largely a
> waste of time to be trying to make them robust against random bit
> fuzzing because such random bit corruptions (like the syzbot images)
> do not occur in the real world.

Note there are also specifically crafted images that can be
automounted whenever one plugs something into USB.

> IOWs, we've had to advance the "state of the art" ourselves because
> nobody else in the fuzzing world responded to the fact we've
> essentially defeated random bit image fuzzing. Not only that, we
> have our own infrastructure that produces repeatable, understandable
> and debuggable failures, and this is something that many 3rd party
> fuzzing efforts do not provide us with.
>
>> If not, it would be useful to add. Among some solutions there are
>> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
>> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
>> (https://github.com/oracle/kernel-fuzzing).  I don't know how
>> xfstests fuzzing works, so I can't say what would be more suitable
>> there.
>
> I think only AFL would be a usable infrastructure, but it would
> require being taught about the ondisk format so it could perturb the
> image in ways that are useful to modern filesystem formats. Lots(!)
> of work, and it's not clear it would do any better than what we
> already have.
>
> Given the results we're getting from our own fuzzers, I don't see
> much point in (XFS developers) investing huge amounts of effort to
> make some other fuzzer equivalent to what we already have. If
> someone else starts fuzzing the current format (v5) XFS filesystems
> and finding problems we haven't, then I'm going to be interested in
> their fuzzing tools.  But (guided) random bit perturbation fuzzing
> of legacy filesystem formats is really not that useful or
> interesting to us right now.

Just asked.

Note that coverage-guidance does not necessary mean bit flipping.
syzkaller combines coverage-guidance with grammar-awareness and other
smartness.

Based on our experience with network testing, main advantage of
syzkaller over just feeding blobs as network packets (even if these
blobs are built in a very smart way) is the following. syzkaller can
build complex interactions between syscalls, external inputs and
blobs. For example, handling of external network packets depend on if
there is an open socket on that port, what setsockopts were called, if
there is a pending receive, what flags were passed to that receive,
were some data sent the other way, etc. For filesystems that would be
various filesystem syscalls executed against the mounted image,
concurrent umount, rebind, switch to read-only mode, etc.
But maybe xfstests do this too, I don't know. Do they?


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-11 Thread Dmitry Vyukov
On Thu, May 10, 2018 at 1:22 AM, Dave Chinner  wrote:
> On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
>> Does "xfstests fuzzing infrastructure" use coverage-guidance?
>
> It's guided manually to fuzz a substantial proportion of the fields
> in the on-disk format that are susceptible to fuzzing bqased
> attacks. It's not complete coverage yet, but it's getting better and
> better, and we're finding more problems from it that random bit
> based fuzzing has ever uncovered.
>
> Also, the xfstests fuzzing defeats the CRC protection now built into
> the metadata, which means it can exercise all the new filesystem
> features that random bit fuzzers cannot exercise. That's the problem
> with fuzzers like syzbot - they can only usefully fuzz the legacy
> filesystem format which doesn't have CRC validation, nor many of the
> other protections that the current filesystem format has to detect
> corruption. This will also allow us to test things like online
> repair of fuzzed structures

syzkaller has 2 techniques to deal with checksums, if you are
interested I can go into more detail.

> Random bit perturbation filesystem image fuzzing was state of the
> art ~10 years ago.  They were made redundant by filesystems like
> XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
> filesystem formats are essentially unfixable, and it's largely a
> waste of time to be trying to make them robust against random bit
> fuzzing because such random bit corruptions (like the syzbot images)
> do not occur in the real world.

Note there are also specifically crafted images that can be
automounted whenever one plugs something into USB.

> IOWs, we've had to advance the "state of the art" ourselves because
> nobody else in the fuzzing world responded to the fact we've
> essentially defeated random bit image fuzzing. Not only that, we
> have our own infrastructure that produces repeatable, understandable
> and debuggable failures, and this is something that many 3rd party
> fuzzing efforts do not provide us with.
>
>> If not, it would be useful to add. Among some solutions there are
>> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
>> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
>> (https://github.com/oracle/kernel-fuzzing).  I don't know how
>> xfstests fuzzing works, so I can't say what would be more suitable
>> there.
>
> I think only AFL would be a usable infrastructure, but it would
> require being taught about the ondisk format so it could perturb the
> image in ways that are useful to modern filesystem formats. Lots(!)
> of work, and it's not clear it would do any better than what we
> already have.
>
> Given the results we're getting from our own fuzzers, I don't see
> much point in (XFS developers) investing huge amounts of effort to
> make some other fuzzer equivalent to what we already have. If
> someone else starts fuzzing the current format (v5) XFS filesystems
> and finding problems we haven't, then I'm going to be interested in
> their fuzzing tools.  But (guided) random bit perturbation fuzzing
> of legacy filesystem formats is really not that useful or
> interesting to us right now.

Just asked.

Note that coverage-guidance does not necessary mean bit flipping.
syzkaller combines coverage-guidance with grammar-awareness and other
smartness.

Based on our experience with network testing, main advantage of
syzkaller over just feeding blobs as network packets (even if these
blobs are built in a very smart way) is the following. syzkaller can
build complex interactions between syscalls, external inputs and
blobs. For example, handling of external network packets depend on if
there is an open socket on that port, what setsockopts were called, if
there is a pending receive, what flags were passed to that receive,
were some data sent the other way, etc. For filesystems that would be
various filesystem syscalls executed against the mounted image,
concurrent umount, rebind, switch to read-only mode, etc.
But maybe xfstests do this too, I don't know. Do they?


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dave Chinner
On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> Does "xfstests fuzzing infrastructure" use coverage-guidance?

It's guided manually to fuzz a substantial proportion of the fields
in the on-disk format that are susceptible to fuzzing bqased
attacks. It's not complete coverage yet, but it's getting better and
better, and we're finding more problems from it that random bit
based fuzzing has ever uncovered.

Also, the xfstests fuzzing defeats the CRC protection now built into
the metadata, which means it can exercise all the new filesystem
features that random bit fuzzers cannot exercise. That's the problem
with fuzzers like syzbot - they can only usefully fuzz the legacy
filesystem format which doesn't have CRC validation, nor many of the
other protections that the current filesystem format has to detect
corruption. This will also allow us to test things like online
repair of fuzzed structures

Random bit perturbation filesystem image fuzzing was state of the
art ~10 years ago.  They were made redundant by filesystems like
XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
filesystem formats are essentially unfixable, and it's largely a
waste of time to be trying to make them robust against random bit
fuzzing because such random bit corruptions (like the syzbot images)
do not occur in the real world.

IOWs, we've had to advance the "state of the art" ourselves because
nobody else in the fuzzing world responded to the fact we've
essentially defeated random bit image fuzzing. Not only that, we
have our own infrastructure that produces repeatable, understandable
and debuggable failures, and this is something that many 3rd party
fuzzing efforts do not provide us with.

> If not, it would be useful to add. Among some solutions there are
> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
> (https://github.com/oracle/kernel-fuzzing).  I don't know how
> xfstests fuzzing works, so I can't say what would be more suitable
> there.

I think only AFL would be a usable infrastructure, but it would
require being taught about the ondisk format so it could perturb the
image in ways that are useful to modern filesystem formats. Lots(!)
of work, and it's not clear it would do any better than what we
already have.

Given the results we're getting from our own fuzzers, I don't see
much point in (XFS developers) investing huge amounts of effort to
make some other fuzzer equivalent to what we already have. If
someone else starts fuzzing the current format (v5) XFS filesystems
and finding problems we haven't, then I'm going to be interested in
their fuzzing tools.  But (guided) random bit perturbation fuzzing
of legacy filesystem formats is really not that useful or
interesting to us right now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dave Chinner
On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> Does "xfstests fuzzing infrastructure" use coverage-guidance?

It's guided manually to fuzz a substantial proportion of the fields
in the on-disk format that are susceptible to fuzzing bqased
attacks. It's not complete coverage yet, but it's getting better and
better, and we're finding more problems from it that random bit
based fuzzing has ever uncovered.

Also, the xfstests fuzzing defeats the CRC protection now built into
the metadata, which means it can exercise all the new filesystem
features that random bit fuzzers cannot exercise. That's the problem
with fuzzers like syzbot - they can only usefully fuzz the legacy
filesystem format which doesn't have CRC validation, nor many of the
other protections that the current filesystem format has to detect
corruption. This will also allow us to test things like online
repair of fuzzed structures

Random bit perturbation filesystem image fuzzing was state of the
art ~10 years ago.  They were made redundant by filesystems like
XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
filesystem formats are essentially unfixable, and it's largely a
waste of time to be trying to make them robust against random bit
fuzzing because such random bit corruptions (like the syzbot images)
do not occur in the real world.

IOWs, we've had to advance the "state of the art" ourselves because
nobody else in the fuzzing world responded to the fact we've
essentially defeated random bit image fuzzing. Not only that, we
have our own infrastructure that produces repeatable, understandable
and debuggable failures, and this is something that many 3rd party
fuzzing efforts do not provide us with.

> If not, it would be useful to add. Among some solutions there are
> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
> (https://github.com/oracle/kernel-fuzzing).  I don't know how
> xfstests fuzzing works, so I can't say what would be more suitable
> there.

I think only AFL would be a usable infrastructure, but it would
require being taught about the ondisk format so it could perturb the
image in ways that are useful to modern filesystem formats. Lots(!)
of work, and it's not clear it would do any better than what we
already have.

Given the results we're getting from our own fuzzers, I don't see
much point in (XFS developers) investing huge amounts of effort to
make some other fuzzer equivalent to what we already have. If
someone else starts fuzzing the current format (v5) XFS filesystems
and finding problems we haven't, then I'm going to be interested in
their fuzzing tools.  But (guided) random bit perturbation fuzzing
of legacy filesystem formats is really not that useful or
interesting to us right now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dmitry Vyukov
On Wed, May 9, 2018 at 3:55 PM, Theodore Y. Ts'o  wrote:
 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
 syzkaller reproducer:
 https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>
>>> What a mess. A hand built, hopelessly broken filesystem image made
>>> up of hex dumps, written into a mmap()d region of memory, then
>>> copied into a tmpfs file and mounted with the loop device.
>>>
>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>> we are to have any hope of understanding what the hell this test is
>>> doing, the bot needs to supply us with a copy of the built
>>> filesystem image the test uses. We need to be able to point forensic
>>> tools at the image to decode all the structures into human readable
>>> format - if we are forced to do that by hand or jump through hoops
>>> to create our own filesystem image than I'm certainly not going to
>>> waste time looking at these reports...
>>
>> Hi Dave,
>>
>> Here is the image:
>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> A suggestion --- insteading of forcing human beings --- either
> overworked file system developers, or understaffed fuzzing tool teams,
> to have to manually pull out the file system image out from the C
> repro, if it's too hard to add a link where the file system iamge can
> be downloaded from the Syzkaller web application --- how about adding
> an option to the C repro template which causes it to dump the image to
> a file and then immediately exit?

Hi Ted,

That's what I proposed above:
https://groups.google.com/d/msg/syzkaller-bugs/KJNNTgTdg_g/NRxarDcYBgAJ
But I did not get response yet.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dmitry Vyukov
On Wed, May 9, 2018 at 3:55 PM, Theodore Y. Ts'o  wrote:
 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
 syzkaller reproducer:
 https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>
>>> What a mess. A hand built, hopelessly broken filesystem image made
>>> up of hex dumps, written into a mmap()d region of memory, then
>>> copied into a tmpfs file and mounted with the loop device.
>>>
>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>> we are to have any hope of understanding what the hell this test is
>>> doing, the bot needs to supply us with a copy of the built
>>> filesystem image the test uses. We need to be able to point forensic
>>> tools at the image to decode all the structures into human readable
>>> format - if we are forced to do that by hand or jump through hoops
>>> to create our own filesystem image than I'm certainly not going to
>>> waste time looking at these reports...
>>
>> Hi Dave,
>>
>> Here is the image:
>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> A suggestion --- insteading of forcing human beings --- either
> overworked file system developers, or understaffed fuzzing tool teams,
> to have to manually pull out the file system image out from the C
> repro, if it's too hard to add a link where the file system iamge can
> be downloaded from the Syzkaller web application --- how about adding
> an option to the C repro template which causes it to dump the image to
> a file and then immediately exit?

Hi Ted,

That's what I proposed above:
https://groups.google.com/d/msg/syzkaller-bugs/KJNNTgTdg_g/NRxarDcYBgAJ
But I did not get response yet.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Theodore Y. Ts'o
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

A suggestion --- insteading of forcing human beings --- either
overworked file system developers, or understaffed fuzzing tool teams,
to have to manually pull out the file system image out from the C
repro, if it's too hard to add a link where the file system iamge can
be downloaded from the Syzkaller web application --- how about adding
an option to the C repro template which causes it to dump the image to
a file and then immediately exit?

- Ted


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Theodore Y. Ts'o
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

A suggestion --- insteading of forcing human beings --- either
overworked file system developers, or understaffed fuzzing tool teams,
to have to manually pull out the file system image out from the C
repro, if it's too hard to add a link where the file system iamge can
be downloaded from the Syzkaller web application --- how about adding
an option to the C repro template which causes it to dump the image to
a file and then immediately exit?

- Ted


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dmitry Vyukov
On Wed, May 9, 2018 at 4:48 AM, Eric Sandeen  wrote:
>
>
> On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>>> Or put another way, how did you arrive at the fs image values in the 
>>> reproducer,
>>> i.e.:
>> Currently they are completely random, nobody taught syzkaller about AGFs, 
>> etc.
>
> So you just combine a few megabytes of purely random bits out of thin air 
> until
> you get something that approximates an xfs filesystem?  Google must have more
> computing power than I was aware of.


syzbot uses very few cores for fuzzing of all of the hundreds of
kernel subsystems. But syzkaller (the underlying fuzzer) uses
coverage-guidance and this makes fuzzing exponentially more efficient
than blind techniques. Coverage-guidance is also combined with
grammar-based generation techniques, which gives additional synergy
(though there are no grammar descriptions for filesystem formats at
the moment).

Does "xfstests fuzzing infrastructure" use coverage-guidance? If not,
it would be useful to add. Among some solutions there are LibFuzzer
(https://llvm.org/docs/LibFuzzer.html), AFL
(http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
(https://github.com/oracle/kernel-fuzzing). I don't know how xfstests
fuzzing works, so I can't say what would be more suitable there.



>>> oid loop()
>>> {
>>>   memcpy((void*)0x2000, "xfs", 4);
>>>   memcpy((void*)0x2100, "./file0", 8);
>>>   *(uint64_t*)0x2200 = 0x2001;
>>>   memcpy((void*)0x2001,
>>>  
>>> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>>  
>>> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>>  
>>> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>>  
>>> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>>  
>>> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>  
>>> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>>  204);
>>>
>>> ...
>>>
>>> The in-memory xfs filesystem it constructs is damaged, is that an 
>>> intentional
>>> part of the fuzzing during the test?
>> Yes, invalid inputs is part of testing.
>


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-09 Thread Dmitry Vyukov
On Wed, May 9, 2018 at 4:48 AM, Eric Sandeen  wrote:
>
>
> On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>>> Or put another way, how did you arrive at the fs image values in the 
>>> reproducer,
>>> i.e.:
>> Currently they are completely random, nobody taught syzkaller about AGFs, 
>> etc.
>
> So you just combine a few megabytes of purely random bits out of thin air 
> until
> you get something that approximates an xfs filesystem?  Google must have more
> computing power than I was aware of.


syzbot uses very few cores for fuzzing of all of the hundreds of
kernel subsystems. But syzkaller (the underlying fuzzer) uses
coverage-guidance and this makes fuzzing exponentially more efficient
than blind techniques. Coverage-guidance is also combined with
grammar-based generation techniques, which gives additional synergy
(though there are no grammar descriptions for filesystem formats at
the moment).

Does "xfstests fuzzing infrastructure" use coverage-guidance? If not,
it would be useful to add. Among some solutions there are LibFuzzer
(https://llvm.org/docs/LibFuzzer.html), AFL
(http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
(https://github.com/oracle/kernel-fuzzing). I don't know how xfstests
fuzzing works, so I can't say what would be more suitable there.



>>> oid loop()
>>> {
>>>   memcpy((void*)0x2000, "xfs", 4);
>>>   memcpy((void*)0x2100, "./file0", 8);
>>>   *(uint64_t*)0x2200 = 0x2001;
>>>   memcpy((void*)0x2001,
>>>  
>>> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>>  
>>> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>>  
>>> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>>  
>>> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>>  
>>> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>  
>>> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>>  
>>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>>  204);
>>>
>>> ...
>>>
>>> The in-memory xfs filesystem it constructs is damaged, is that an 
>>> intentional
>>> part of the fuzzing during the test?
>> Yes, invalid inputs is part of testing.
>


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Sandeen


On 5/8/18 9:37 PM, Eric Biggers wrote:
> On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
>> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
>>> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
 Hello,

 syzbot hit the following crash on upstream commit
 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
 +)
 Merge branch 'ras-core-for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
 syzbot dashboard link:
 https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba

 C reproducer: 
 https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
 syzkaller reproducer:
 https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>
>>> What a mess. A hand built, hopelessly broken filesystem image made
>>> up of hex dumps, written into a mmap()d region of memory, then
>>> copied into a tmpfs file and mounted with the loop device.
>>>
>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>> we are to have any hope of understanding what the hell this test is
>>> doing, the bot needs to supply us with a copy of the built
>>> filesystem image the test uses. We need to be able to point forensic
>>> tools at the image to decode all the structures into human readable
>>> format - if we are forced to do that by hand or jump through hoops
>>> to create our own filesystem image than I'm certainly not going to
>>> waste time looking at these reports...
>>
>> Hi Dave,
>>
>> Here is the image:
>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> Have anybody looked at the bug and the image yet?

 Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
 kernel here.
>>>
>>> Do you think it is fixed now? What fixed it? The bug was there.
>>
>> We merge fixes for fuzzing issues all the time. IIRC a big batch of
>> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
>>
>> If you want a commit, then do a bisect
>>
> 
> The fix was commit 8241f7f983b9728:
> 
> #syz fix: xfs: don't iunlock the quota ip when quota block

Ah, thanks.  Interestingly that one was sent to the xfs list on 2/22, a couple
months before this bug report.  Took some time to get reviewed and merged
upstream, and it actually landed upstream in Linus' kernel not long after
this report...

-Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Sandeen


On 5/8/18 9:37 PM, Eric Biggers wrote:
> On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
>> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
>>> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
 Hello,

 syzbot hit the following crash on upstream commit
 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
 +)
 Merge branch 'ras-core-for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
 syzbot dashboard link:
 https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba

 C reproducer: 
 https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
 syzkaller reproducer:
 https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>
>>> What a mess. A hand built, hopelessly broken filesystem image made
>>> up of hex dumps, written into a mmap()d region of memory, then
>>> copied into a tmpfs file and mounted with the loop device.
>>>
>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>> we are to have any hope of understanding what the hell this test is
>>> doing, the bot needs to supply us with a copy of the built
>>> filesystem image the test uses. We need to be able to point forensic
>>> tools at the image to decode all the structures into human readable
>>> format - if we are forced to do that by hand or jump through hoops
>>> to create our own filesystem image than I'm certainly not going to
>>> waste time looking at these reports...
>>
>> Hi Dave,
>>
>> Here is the image:
>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> Have anybody looked at the bug and the image yet?

 Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
 kernel here.
>>>
>>> Do you think it is fixed now? What fixed it? The bug was there.
>>
>> We merge fixes for fuzzing issues all the time. IIRC a big batch of
>> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
>>
>> If you want a commit, then do a bisect
>>
> 
> The fix was commit 8241f7f983b9728:
> 
> #syz fix: xfs: don't iunlock the quota ip when quota block

Ah, thanks.  Interestingly that one was sent to the xfs list on 2/22, a couple
months before this bug report.  Took some time to get reviewed and merged
upstream, and it actually landed upstream in Linus' kernel not long after
this report...

-Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Sandeen


On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>> Or put another way, how did you arrive at the fs image values in the 
>> reproducer,
>> i.e.:
> Currently they are completely random, nobody taught syzkaller about AGFs, etc.

So you just combine a few megabytes of purely random bits out of thin air until
you get something that approximates an xfs filesystem?  Google must have more
computing power than I was aware of.

-Eric

>> oid loop()
>> {
>>   memcpy((void*)0x2000, "xfs", 4);
>>   memcpy((void*)0x2100, "./file0", 8);
>>   *(uint64_t*)0x2200 = 0x2001;
>>   memcpy((void*)0x2001,
>>  
>> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>  
>> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>  
>> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>  
>> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>  
>> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>  
>> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>  204);
>>
>> ...
>>
>> The in-memory xfs filesystem it constructs is damaged, is that an intentional
>> part of the fuzzing during the test?
> Yes, invalid inputs is part of testing.



Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Sandeen


On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>> Or put another way, how did you arrive at the fs image values in the 
>> reproducer,
>> i.e.:
> Currently they are completely random, nobody taught syzkaller about AGFs, etc.

So you just combine a few megabytes of purely random bits out of thin air until
you get something that approximates an xfs filesystem?  Google must have more
computing power than I was aware of.

-Eric

>> oid loop()
>> {
>>   memcpy((void*)0x2000, "xfs", 4);
>>   memcpy((void*)0x2100, "./file0", 8);
>>   *(uint64_t*)0x2200 = 0x2001;
>>   memcpy((void*)0x2001,
>>  
>> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>  
>> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>  
>> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>  
>> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>  
>> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>  
>> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>  
>> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>  204);
>>
>> ...
>>
>> The in-memory xfs filesystem it constructs is damaged, is that an intentional
>> part of the fuzzing during the test?
> Yes, invalid inputs is part of testing.



Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Biggers
On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> > On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
> > >> >>> Hello,
> > >> >>>
> > >> >>> syzbot hit the following crash on upstream commit
> > >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
> > >> >>> +)
> > >> >>> Merge branch 'ras-core-for-linus' of
> > >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> >>> syzbot dashboard link:
> > >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >> >>>
> > >> >>> C reproducer: 
> > >> >>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> >>> syzkaller reproducer:
> > >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >> >>
> > >> >> What a mess. A hand built, hopelessly broken filesystem image made
> > >> >> up of hex dumps, written into a mmap()d region of memory, then
> > >> >> copied into a tmpfs file and mounted with the loop device.
> > >> >>
> > >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> > >> >> we are to have any hope of understanding what the hell this test is
> > >> >> doing, the bot needs to supply us with a copy of the built
> > >> >> filesystem image the test uses. We need to be able to point forensic
> > >> >> tools at the image to decode all the structures into human readable
> > >> >> format - if we are forced to do that by hand or jump through hoops
> > >> >> to create our own filesystem image than I'm certainly not going to
> > >> >> waste time looking at these reports...
> > >> >
> > >> > Hi Dave,
> > >> >
> > >> > Here is the image:
> > >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > >>
> > >> Have anybody looked at the bug and the image yet?
> > >
> > > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > > kernel here.
> > 
> > Do you think it is fixed now? What fixed it? The bug was there.
> 
> We merge fixes for fuzzing issues all the time. IIRC a big batch of
> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
> 
> If you want a commit, then do a bisect
> 

The fix was commit 8241f7f983b9728:

#syz fix: xfs: don't iunlock the quota ip when quota block

- Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Eric Biggers
On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> > On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
> > >> >>> Hello,
> > >> >>>
> > >> >>> syzbot hit the following crash on upstream commit
> > >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
> > >> >>> +)
> > >> >>> Merge branch 'ras-core-for-linus' of
> > >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> >>> syzbot dashboard link:
> > >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >> >>>
> > >> >>> C reproducer: 
> > >> >>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> >>> syzkaller reproducer:
> > >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >> >>
> > >> >> What a mess. A hand built, hopelessly broken filesystem image made
> > >> >> up of hex dumps, written into a mmap()d region of memory, then
> > >> >> copied into a tmpfs file and mounted with the loop device.
> > >> >>
> > >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> > >> >> we are to have any hope of understanding what the hell this test is
> > >> >> doing, the bot needs to supply us with a copy of the built
> > >> >> filesystem image the test uses. We need to be able to point forensic
> > >> >> tools at the image to decode all the structures into human readable
> > >> >> format - if we are forced to do that by hand or jump through hoops
> > >> >> to create our own filesystem image than I'm certainly not going to
> > >> >> waste time looking at these reports...
> > >> >
> > >> > Hi Dave,
> > >> >
> > >> > Here is the image:
> > >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > >>
> > >> Have anybody looked at the bug and the image yet?
> > >
> > > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > > kernel here.
> > 
> > Do you think it is fixed now? What fixed it? The bug was there.
> 
> We merge fixes for fuzzing issues all the time. IIRC a big batch of
> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
> 
> If you want a commit, then do a bisect
> 

The fix was commit 8241f7f983b9728:

#syz fix: xfs: don't iunlock the quota ip when quota block

- Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dave Chinner
On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
> >> >>> Hello,
> >> >>>
> >> >>> syzbot hit the following crash on upstream commit
> >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
> >> >>> +)
> >> >>> Merge branch 'ras-core-for-linus' of
> >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> >>> syzbot dashboard link:
> >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >> >>>
> >> >>> C reproducer: 
> >> >>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> >>> syzkaller reproducer:
> >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >> >>
> >> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> >> up of hex dumps, written into a mmap()d region of memory, then
> >> >> copied into a tmpfs file and mounted with the loop device.
> >> >>
> >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> >> we are to have any hope of understanding what the hell this test is
> >> >> doing, the bot needs to supply us with a copy of the built
> >> >> filesystem image the test uses. We need to be able to point forensic
> >> >> tools at the image to decode all the structures into human readable
> >> >> format - if we are forced to do that by hand or jump through hoops
> >> >> to create our own filesystem image than I'm certainly not going to
> >> >> waste time looking at these reports...
> >> >
> >> > Hi Dave,
> >> >
> >> > Here is the image:
> >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> >>
> >> Have anybody looked at the bug and the image yet?
> >
> > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > kernel here.
> 
> Do you think it is fixed now? What fixed it? The bug was there.

We merge fixes for fuzzing issues all the time. IIRC a big batch of
them from the xfstests fuzzing infrastructure went into 4.17-rc1.

If you want a commit, then do a bisect

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dave Chinner
On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
> >> >>> Hello,
> >> >>>
> >> >>> syzbot hit the following crash on upstream commit
> >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 
> >> >>> +)
> >> >>> Merge branch 'ras-core-for-linus' of
> >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> >>> syzbot dashboard link:
> >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >> >>>
> >> >>> C reproducer: 
> >> >>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> >>> syzkaller reproducer:
> >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >> >>
> >> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> >> up of hex dumps, written into a mmap()d region of memory, then
> >> >> copied into a tmpfs file and mounted with the loop device.
> >> >>
> >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> >> we are to have any hope of understanding what the hell this test is
> >> >> doing, the bot needs to supply us with a copy of the built
> >> >> filesystem image the test uses. We need to be able to point forensic
> >> >> tools at the image to decode all the structures into human readable
> >> >> format - if we are forced to do that by hand or jump through hoops
> >> >> to create our own filesystem image than I'm certainly not going to
> >> >> waste time looking at these reports...
> >> >
> >> > Hi Dave,
> >> >
> >> > Here is the image:
> >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> >>
> >> Have anybody looked at the bug and the image yet?
> >
> > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > kernel here.
> 
> Do you think it is fixed now? What fixed it? The bug was there.

We merge fixes for fuzzing issues all the time. IIRC a big batch of
them from the xfstests fuzzing infrastructure went into 4.17-rc1.

If you want a commit, then do a bisect

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
>> >>> Hello,
>> >>>
>> >>> syzbot hit the following crash on upstream commit
>> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> >>> Merge branch 'ras-core-for-linus' of
>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >>> syzbot dashboard link:
>> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> >>>
>> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> >>> syzkaller reproducer:
>> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> >>
>> >> What a mess. A hand built, hopelessly broken filesystem image made
>> >> up of hex dumps, written into a mmap()d region of memory, then
>> >> copied into a tmpfs file and mounted with the loop device.
>> >>
>> >> Engineers that can debug broken filesystems don't grow on trees.  If
>> >> we are to have any hope of understanding what the hell this test is
>> >> doing, the bot needs to supply us with a copy of the built
>> >> filesystem image the test uses. We need to be able to point forensic
>> >> tools at the image to decode all the structures into human readable
>> >> format - if we are forced to do that by hand or jump through hoops
>> >> to create our own filesystem image than I'm certainly not going to
>> >> waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>
>> Have anybody looked at the bug and the image yet?
>
> Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> kernel here.

Do you think it is fixed now? What fixed it? The bug was there.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Wed, May 2, 2018 at 12:51 AM, Dave Chinner  wrote:
>> >>> Hello,
>> >>>
>> >>> syzbot hit the following crash on upstream commit
>> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> >>> Merge branch 'ras-core-for-linus' of
>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >>> syzbot dashboard link:
>> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> >>>
>> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> >>> syzkaller reproducer:
>> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> >>
>> >> What a mess. A hand built, hopelessly broken filesystem image made
>> >> up of hex dumps, written into a mmap()d region of memory, then
>> >> copied into a tmpfs file and mounted with the loop device.
>> >>
>> >> Engineers that can debug broken filesystems don't grow on trees.  If
>> >> we are to have any hope of understanding what the hell this test is
>> >> doing, the bot needs to supply us with a copy of the built
>> >> filesystem image the test uses. We need to be able to point forensic
>> >> tools at the image to decode all the structures into human readable
>> >> format - if we are forced to do that by hand or jump through hoops
>> >> to create our own filesystem image than I'm certainly not going to
>> >> waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>
>> Have anybody looked at the bug and the image yet?
>
> Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> kernel here.

Do you think it is fixed now? What fixed it? The bug was there.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
>
> ...
>
 It just extracted kernel source file name that looked relevant
 to this crash and run get_maintainers.pl on it.
 Also the image can contain dynamically generated data, which makes it
 impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.
>
> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)
>
> ...
>
>>> That was not at all clear to me.  I thought when syzkaller was telling us
>>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>>> I'm not sure if anyone else made that mistake, but  perhaps you could also 
>>> clarify
>>> the bug report text in this regard?
>>
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>
> git tree:   
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> HEAD:   f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead 
> and swap_vma_readahead()
>
> to make it clear that it has not identified that commit as the culprit, it's
> just the head of the tree you were testing?  (I think I have the correct git
> nomenclature ...)


This is done now, you can see example of new format here:

https://lkml.org/lkml/2018/5/8/36

It says "HEAD commit" and also "syzbot engineers can be reached at ".


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
>
> ...
>
 It just extracted kernel source file name that looked relevant
 to this crash and run get_maintainers.pl on it.
 Also the image can contain dynamically generated data, which makes it
 impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.
>
> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)
>
> ...
>
>>> That was not at all clear to me.  I thought when syzkaller was telling us
>>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>>> I'm not sure if anyone else made that mistake, but  perhaps you could also 
>>> clarify
>>> the bug report text in this regard?
>>
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>
> git tree:   
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> HEAD:   f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead 
> and swap_vma_readahead()
>
> to make it clear that it has not identified that commit as the culprit, it's
> just the head of the tree you were testing?  (I think I have the correct git
> nomenclature ...)


This is done now, you can see example of new format here:

https://lkml.org/lkml/2018/5/8/36

It says "HEAD commit" and also "syzbot engineers can be reached at ".


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
>
> ...
>
 It just extracted kernel source file name that looked relevant
 to this crash and run get_maintainers.pl on it.
 Also the image can contain dynamically generated data, which makes it
 impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.

Not necessary. Image can be dynamically generate too, all inputs to
kernel are generally dynamically generated.


> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)

OK, let's do it this way. For the first 10 bugs, ask me, and I will do
it manually.
I am all for automation. And syzbot is already more automated than
most kernel testing systems. But, as I said, this is really
not-trivial, large amount of work, and is specific to one out of
dozens of kernel subsystems.


> Ok, backing up more: When you are testing against an xfs filesystem image, 
> where
> does that image come from?  How is it generated?  A quick look at the 
> syzkaller
> tree didn't make that clear to me.
>
> the xfs.repro file you provided at
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> is strange, it doesn't even contain AGF blocks; they aren't fuzzed or 
> corrupted,
> they are completely zeroed out.  I don't know if that's part of the fuzzing,
> or what - what steps led to that image?
>
> Or put another way, how did you arrive at the fs image values in the 
> reproducer,
> i.e.:

Currently they are completely random, nobody taught syzkaller about AGFs, etc.

> oid loop()
> {
>   memcpy((void*)0x2000, "xfs", 4);
>   memcpy((void*)0x2100, "./file0", 8);
>   *(uint64_t*)0x2200 = 0x2001;
>   memcpy((void*)0x2001,
>  
> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>  
> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>  
> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>  
> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>  
> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>  
> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>  204);
>
> ...
>
> The in-memory xfs filesystem it constructs is damaged, is that an intentional
> part of the fuzzing during the test?

Yes, invalid inputs is part of testing.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-08 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
>
> ...
>
 It just extracted kernel source file name that looked relevant
 to this crash and run get_maintainers.pl on it.
 Also the image can contain dynamically generated data, which makes it
 impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.

Not necessary. Image can be dynamically generate too, all inputs to
kernel are generally dynamically generated.


> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)

OK, let's do it this way. For the first 10 bugs, ask me, and I will do
it manually.
I am all for automation. And syzbot is already more automated than
most kernel testing systems. But, as I said, this is really
not-trivial, large amount of work, and is specific to one out of
dozens of kernel subsystems.


> Ok, backing up more: When you are testing against an xfs filesystem image, 
> where
> does that image come from?  How is it generated?  A quick look at the 
> syzkaller
> tree didn't make that clear to me.
>
> the xfs.repro file you provided at
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> is strange, it doesn't even contain AGF blocks; they aren't fuzzed or 
> corrupted,
> they are completely zeroed out.  I don't know if that's part of the fuzzing,
> or what - what steps led to that image?
>
> Or put another way, how did you arrive at the fs image values in the 
> reproducer,
> i.e.:

Currently they are completely random, nobody taught syzkaller about AGFs, etc.

> oid loop()
> {
>   memcpy((void*)0x2000, "xfs", 4);
>   memcpy((void*)0x2100, "./file0", 8);
>   *(uint64_t*)0x2200 = 0x2001;
>   memcpy((void*)0x2001,
>  
> "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>  
> "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>  
> "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>  
> "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>  
> "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>  
> "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>  
> "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>  204);
>
> ...
>
> The in-memory xfs filesystem it constructs is damaged, is that an intentional
> part of the fuzzing during the test?

Yes, invalid inputs is part of testing.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-02 Thread Jan Tulak
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>

snip

You are not the only one, I (mis)understand it the same as you.

Jan


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-02 Thread Jan Tulak
On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen  wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>

snip

You are not the only one, I (mis)understand it the same as you.

Jan


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-01 Thread Dave Chinner
On Mon, Apr 30, 2018 at 03:24:48PM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov  wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> >> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot hit the following crash on upstream commit
> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> >>> Merge branch 'ras-core-for-linus' of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >>> syzbot dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>>
> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >>> syzkaller reproducer:
> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >>
> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> up of hex dumps, written into a mmap()d region of memory, then
> >> copied into a tmpfs file and mounted with the loop device.
> >>
> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> we are to have any hope of understanding what the hell this test is
> >> doing, the bot needs to supply us with a copy of the built
> >> filesystem image the test uses. We need to be able to point forensic
> >> tools at the image to decode all the structures into human readable
> >> format - if we are forced to do that by hand or jump through hoops
> >> to create our own filesystem image than I'm certainly not going to
> >> waste time looking at these reports...
> >
> > Hi Dave,
> >
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> 
> Have anybody looked at the bug and the image yet?

Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
kernel here.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-05-01 Thread Dave Chinner
On Mon, Apr 30, 2018 at 03:24:48PM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov  wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> >> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot hit the following crash on upstream commit
> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> >>> Merge branch 'ras-core-for-linus' of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >>> syzbot dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>>
> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >>> syzkaller reproducer:
> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >>
> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> up of hex dumps, written into a mmap()d region of memory, then
> >> copied into a tmpfs file and mounted with the loop device.
> >>
> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> we are to have any hope of understanding what the hell this test is
> >> doing, the bot needs to supply us with a copy of the built
> >> filesystem image the test uses. We need to be able to point forensic
> >> tools at the image to decode all the structures into human readable
> >> format - if we are forced to do that by hand or jump through hoops
> >> to create our own filesystem image than I'm certainly not going to
> >> waste time looking at these reports...
> >
> > Hi Dave,
> >
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> 
> Have anybody looked at the bug and the image yet?

Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
kernel here.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Eric Sandeen
On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:

...

>>> It just extracted kernel source file name that looked relevant
>>> to this crash and run get_maintainers.pl on it.
>>> Also the image can contain dynamically generated data, which makes it
>>> impossible to have as a file at all.
>>
>> I guess I'm not sure what this means, can you explain?
> 
> Say, a value that we generally pass to close system call is not static
> and can't be dumped to a static file. It's whatever a previous open
> system call has returned. Inside of the program we memorize the return
> value of open in a variable and then pass it to close. This generally
> stands for all system calls. Say, an image can contain an uid, and
> that uid can be obtained from a system call too.

Ok, but that's the syscall side.  You are operating on a static xfs image,
correct?  We're only asking for the actual filesystem you're operating
against.

(When I say "image" I am talking only about the filesystem itself, not any
other syzkaller state)
 
...

>> That was not at all clear to me.  I thought when syzkaller was telling us
>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>> I'm not sure if anyone else made that mistake, but  perhaps you could also 
>> clarify
>> the bug report text in this regard?
> 
> Suggestions are welcome. Currently it says "syzbot hit the following
> crash on upstream commit SHA1", which was supposed to mean just the
> state of the source tree when the crash happened. But I am not a
> native speaker, so perhaps I am saying not what I intend to say.
> 
> There are also suggestions on report format improvement from +Ted
> currently in works:
> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
> Not sure if they make this distinction 100% clear, though.

Maybe I was the only one who misunderstood, but something like

git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
HEAD:   f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead 
and swap_vma_readahead()
 
to make it clear that it has not identified that commit as the culprit, it's
just the head of the tree you were testing?  (I think I have the correct git
nomenclature ...)

...

>> If the base image only has one allocation group, it makes it more difficult 
>> for
>> some tools to work with the image, because there is no redundancy.  1 AG is
>> not a supported or recommended geometry for any real-life use of xfs.
>>
>> If I am correct that you start with a base image w/ a certain geometry or
>> set of mkfs options, starting with >= 2 AGs would improve the usefulness of 
>> the
>> filesystem image.
> 
> syzkaller can generate/mutate images based on structured format
> templates, but for now we don't have any templates and these are just
> opaque blobs.

Ok, backing up more: When you are testing against an xfs filesystem image, where
does that image come from?  How is it generated?  A quick look at the syzkaller
tree didn't make that clear to me.

the xfs.repro file you provided at
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

is strange, it doesn't even contain AGF blocks; they aren't fuzzed or corrupted,
they are completely zeroed out.  I don't know if that's part of the fuzzing,
or what - what steps led to that image?

Or put another way, how did you arrive at the fs image values in the reproducer,
i.e.:

oid loop()
{
  memcpy((void*)0x2000, "xfs", 4);
  memcpy((void*)0x2100, "./file0", 8);
  *(uint64_t*)0x2200 = 0x2001;
  memcpy((void*)0x2001,
 "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
 "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
 "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
 "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
 "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
 "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
 "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
 "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
 204);

... 

The in-memory xfs filesystem it constructs is damaged, is that an intentional
part of the fuzzing during the test?

-Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Eric Sandeen
On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:

...

>>> It just extracted kernel source file name that looked relevant
>>> to this crash and run get_maintainers.pl on it.
>>> Also the image can contain dynamically generated data, which makes it
>>> impossible to have as a file at all.
>>
>> I guess I'm not sure what this means, can you explain?
> 
> Say, a value that we generally pass to close system call is not static
> and can't be dumped to a static file. It's whatever a previous open
> system call has returned. Inside of the program we memorize the return
> value of open in a variable and then pass it to close. This generally
> stands for all system calls. Say, an image can contain an uid, and
> that uid can be obtained from a system call too.

Ok, but that's the syscall side.  You are operating on a static xfs image,
correct?  We're only asking for the actual filesystem you're operating
against.

(When I say "image" I am talking only about the filesystem itself, not any
other syzkaller state)
 
...

>> That was not at all clear to me.  I thought when syzkaller was telling us
>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>> I'm not sure if anyone else made that mistake, but  perhaps you could also 
>> clarify
>> the bug report text in this regard?
> 
> Suggestions are welcome. Currently it says "syzbot hit the following
> crash on upstream commit SHA1", which was supposed to mean just the
> state of the source tree when the crash happened. But I am not a
> native speaker, so perhaps I am saying not what I intend to say.
> 
> There are also suggestions on report format improvement from +Ted
> currently in works:
> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
> Not sure if they make this distinction 100% clear, though.

Maybe I was the only one who misunderstood, but something like

git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
HEAD:   f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead 
and swap_vma_readahead()
 
to make it clear that it has not identified that commit as the culprit, it's
just the head of the tree you were testing?  (I think I have the correct git
nomenclature ...)

...

>> If the base image only has one allocation group, it makes it more difficult 
>> for
>> some tools to work with the image, because there is no redundancy.  1 AG is
>> not a supported or recommended geometry for any real-life use of xfs.
>>
>> If I am correct that you start with a base image w/ a certain geometry or
>> set of mkfs options, starting with >= 2 AGs would improve the usefulness of 
>> the
>> filesystem image.
> 
> syzkaller can generate/mutate images based on structured format
> templates, but for now we don't have any templates and these are just
> opaque blobs.

Ok, backing up more: When you are testing against an xfs filesystem image, where
does that image come from?  How is it generated?  A quick look at the syzkaller
tree didn't make that clear to me.

the xfs.repro file you provided at
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

is strange, it doesn't even contain AGF blocks; they aren't fuzzed or corrupted,
they are completely zeroed out.  I don't know if that's part of the fuzzing,
or what - what steps led to that image?

Or put another way, how did you arrive at the fs image values in the reproducer,
i.e.:

oid loop()
{
  memcpy((void*)0x2000, "xfs", 4);
  memcpy((void*)0x2100, "./file0", 8);
  *(uint64_t*)0x2200 = 0x2001;
  memcpy((void*)0x2001,
 "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
 "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
 "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
 "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
 "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
 "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
 "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
 "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
 "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
 204);

... 

The in-memory xfs filesystem it constructs is damaged, is that an intentional
part of the fuzzing during the test?

-Eric


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
> On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:
>
> ...
>
>>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>>
>>> i.e. you are mounting an image to reproduce the problem, correct?
>>> And the system is "smart" enough to fire off an email to a filesystem list;
>>> if it does so, add a link to the image itself, as you already have already 
>>> done
>>> for the C reproducer.
>>>
>>> Filesystem images are common parlance for filesystem engineers.  When
>>> you engage with them you'll have better results if you  provide them with
>>> inputs they can use directly instead of requiring them to reverse-engineer
>>> your custom test harness.
>>
>>
>> Well, yes and no.
>> syz_mount_image() is the only part of a large system that knows about
>> images. For the rest of the system it's just a syscall like any other
>> syscall. And the part that sends emails is far away from
>> syz_mount_image().
>> syzbot does not know per se that it sends an email to filesystems
>> list.
>
> I am asking it to learn this trick as an enhancement.  The MAINTAINERS
> file contains big clues about which subsystems are filesystems, for example:
>
> $ grep FILESYSTEM$ MAINTAINERS
> AFS FILESYSTEM
> CRAMFS FILESYSTEM
> EFI VARIABLE FILESYSTEM
> EFS FILESYSTEM
> FREEVXFS FILESYSTEM
> ...
>
>> It just extracted kernel source file name that looked relevant
>> to this crash and run get_maintainers.pl on it.
>> Also the image can contain dynamically generated data, which makes it
>> impossible to have as a file at all.
>
> I guess I'm not sure what this means, can you explain?

Say, a value that we generally pass to close system call is not static
and can't be dumped to a static file. It's whatever a previous open
system call has returned. Inside of the program we memorize the return
value of open in a variable and then pass it to close. This generally
stands for all system calls. Say, an image can contain an uid, and
that uid can be obtained from a system call too.


>> Thinking of this, what should be reasonably easy to do and may be a
>> compromise for near future is the following. We could insert code into
>> syz_mount_image() which dump the image if you build a program with a
>> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?
>
> If this is possible, I guess I still don't understand why you can't dump the
> image and provide link.  You have fast, efficient robots.  We have slow,
> busy humans.
>
 Please elaborate re commits. It's a basic rule of any good bug report
 -- communicate exact state of source code when the bug was hit, i.e.
 provide the commit hash.
>>>
>>> Further best practice is to provide the /correct/ commit hash.
>
> 
>
>>> I can't imagine these are right...
>>
>>
>> As I said, bisection is on our plate:
>> https://github.com/google/syzkaller/issues/501
>> Though, we will see how well it works, because it's not trivial (see
>> the issue for details).
>
> Oh I see. I had misunderstood; so:
>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> doesn't mean you bisected to that commit, or that this was the first bad 
> commit,
> it just means you happened to have a tree at this commit when you hit the 
> problem.
>
> That was not at all clear to me.  I thought when syzkaller was telling us
> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
> I'm not sure if anyone else made that mistake, but  perhaps you could also 
> clarify
> the bug report text in this regard?

Suggestions are welcome. Currently it says "syzbot hit the following
crash on upstream commit SHA1", which was supposed to mean just the
state of the source tree when the crash happened. But I am not a
native speaker, so perhaps I am saying not what I intend to say.

There are also suggestions on report format improvement from +Ted
currently in works:
https://github.com/google/syzkaller/issues/565#issuecomment-380792942
Not sure if they make this distinction 100% clear, though.


>>> I think that in this case, what we are asking for is a fine tuning of the
>>> testing and reporting so that we can more efficiently address these issues.
>>> Off the top of my head, and there may be more items:
>>>
>>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>>two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>>tbh.)
>>
>> There is a human contact at syzkal...@googlegroups.com. Report footer
>> will be improved to make it more clear:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
>>
>>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look 
>>> like
>>>a recent 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen  wrote:
> On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:
>
> ...
>
>>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>>
>>> i.e. you are mounting an image to reproduce the problem, correct?
>>> And the system is "smart" enough to fire off an email to a filesystem list;
>>> if it does so, add a link to the image itself, as you already have already 
>>> done
>>> for the C reproducer.
>>>
>>> Filesystem images are common parlance for filesystem engineers.  When
>>> you engage with them you'll have better results if you  provide them with
>>> inputs they can use directly instead of requiring them to reverse-engineer
>>> your custom test harness.
>>
>>
>> Well, yes and no.
>> syz_mount_image() is the only part of a large system that knows about
>> images. For the rest of the system it's just a syscall like any other
>> syscall. And the part that sends emails is far away from
>> syz_mount_image().
>> syzbot does not know per se that it sends an email to filesystems
>> list.
>
> I am asking it to learn this trick as an enhancement.  The MAINTAINERS
> file contains big clues about which subsystems are filesystems, for example:
>
> $ grep FILESYSTEM$ MAINTAINERS
> AFS FILESYSTEM
> CRAMFS FILESYSTEM
> EFI VARIABLE FILESYSTEM
> EFS FILESYSTEM
> FREEVXFS FILESYSTEM
> ...
>
>> It just extracted kernel source file name that looked relevant
>> to this crash and run get_maintainers.pl on it.
>> Also the image can contain dynamically generated data, which makes it
>> impossible to have as a file at all.
>
> I guess I'm not sure what this means, can you explain?

Say, a value that we generally pass to close system call is not static
and can't be dumped to a static file. It's whatever a previous open
system call has returned. Inside of the program we memorize the return
value of open in a variable and then pass it to close. This generally
stands for all system calls. Say, an image can contain an uid, and
that uid can be obtained from a system call too.


>> Thinking of this, what should be reasonably easy to do and may be a
>> compromise for near future is the following. We could insert code into
>> syz_mount_image() which dump the image if you build a program with a
>> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?
>
> If this is possible, I guess I still don't understand why you can't dump the
> image and provide link.  You have fast, efficient robots.  We have slow,
> busy humans.
>
 Please elaborate re commits. It's a basic rule of any good bug report
 -- communicate exact state of source code when the bug was hit, i.e.
 provide the commit hash.
>>>
>>> Further best practice is to provide the /correct/ commit hash.
>
> 
>
>>> I can't imagine these are right...
>>
>>
>> As I said, bisection is on our plate:
>> https://github.com/google/syzkaller/issues/501
>> Though, we will see how well it works, because it's not trivial (see
>> the issue for details).
>
> Oh I see. I had misunderstood; so:
>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> doesn't mean you bisected to that commit, or that this was the first bad 
> commit,
> it just means you happened to have a tree at this commit when you hit the 
> problem.
>
> That was not at all clear to me.  I thought when syzkaller was telling us
> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
> I'm not sure if anyone else made that mistake, but  perhaps you could also 
> clarify
> the bug report text in this regard?

Suggestions are welcome. Currently it says "syzbot hit the following
crash on upstream commit SHA1", which was supposed to mean just the
state of the source tree when the crash happened. But I am not a
native speaker, so perhaps I am saying not what I intend to say.

There are also suggestions on report format improvement from +Ted
currently in works:
https://github.com/google/syzkaller/issues/565#issuecomment-380792942
Not sure if they make this distinction 100% clear, though.


>>> I think that in this case, what we are asking for is a fine tuning of the
>>> testing and reporting so that we can more efficiently address these issues.
>>> Off the top of my head, and there may be more items:
>>>
>>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>>two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>>tbh.)
>>
>> There is a human contact at syzkal...@googlegroups.com. Report footer
>> will be improved to make it more clear:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
>>
>>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look 
>>> like
>>>a recent regression, you can state that too.
>>
>> 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Eric Sandeen
On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:

...

>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>
>> i.e. you are mounting an image to reproduce the problem, correct?
>> And the system is "smart" enough to fire off an email to a filesystem list;
>> if it does so, add a link to the image itself, as you already have already 
>> done
>> for the C reproducer.
>>
>> Filesystem images are common parlance for filesystem engineers.  When
>> you engage with them you'll have better results if you  provide them with
>> inputs they can use directly instead of requiring them to reverse-engineer
>> your custom test harness.
> 
> 
> Well, yes and no.
> syz_mount_image() is the only part of a large system that knows about
> images. For the rest of the system it's just a syscall like any other
> syscall. And the part that sends emails is far away from
> syz_mount_image().
> syzbot does not know per se that it sends an email to filesystems
> list.

I am asking it to learn this trick as an enhancement.  The MAINTAINERS
file contains big clues about which subsystems are filesystems, for example:

$ grep FILESYSTEM$ MAINTAINERS 
AFS FILESYSTEM
CRAMFS FILESYSTEM
EFI VARIABLE FILESYSTEM
EFS FILESYSTEM
FREEVXFS FILESYSTEM
...

> It just extracted kernel source file name that looked relevant
> to this crash and run get_maintainers.pl on it.
> Also the image can contain dynamically generated data, which makes it
> impossible to have as a file at all.

I guess I'm not sure what this means, can you explain?

> Thinking of this, what should be reasonably easy to do and may be a
> compromise for near future is the following. We could insert code into
> syz_mount_image() which dump the image if you build a program with a
> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?

If this is possible, I guess I still don't understand why you can't dump the
image and provide link.  You have fast, efficient robots.  We have slow,
busy humans.

>>> Please elaborate re commits. It's a basic rule of any good bug report
>>> -- communicate exact state of source code when the bug was hit, i.e.
>>> provide the commit hash.
>>
>> Further best practice is to provide the /correct/ commit hash.



>> I can't imagine these are right...
> 
> 
> As I said, bisection is on our plate:
> https://github.com/google/syzkaller/issues/501
> Though, we will see how well it works, because it's not trivial (see
> the issue for details).

Oh I see. I had misunderstood; so:

> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> Merge branch 'ras-core-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

doesn't mean you bisected to that commit, or that this was the first bad commit,
it just means you happened to have a tree at this commit when you hit the 
problem.

That was not at all clear to me.  I thought when syzkaller was telling us
"on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
I'm not sure if anyone else made that mistake, but  perhaps you could also 
clarify
the bug report text in this regard?

...

>> I think that in this case, what we are asking for is a fine tuning of the
>> testing and reporting so that we can more efficiently address these issues.
>> Off the top of my head, and there may be more items:
>>
>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>tbh.)
> 
> There is a human contact at syzkal...@googlegroups.com. Report footer
> will be improved to make it more clear:
> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
> 
>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look 
>> like
>>a recent regression, you can state that too.
> 
> Bisection is on our plate.
> 
>> 3) If you're reporting a filesystem bug that arose from using a filesystem
>>image, provide a URL to that filesystem image directly in the report.
> 
> See above. It may not be necessary representable as a static file at all.

Can you explain this?  Do you mean that the mounted image is changing while the
tool runs, while the filesystem is mounted?
 
>> 4) Create a filesystem image that can be more easily debugged by the experts,
>>i.e. one with > 1 allocation group, so standard repair & analysis tools 
>> can
>>be used with it.
> 
> What is "> 1 allocation group"?

Maybe I should back up; how are the xfs images created?  I had assumed that
surely you start with a base image of some sort, and start fuzzing it from 
there.
Is this correct?

If so, allocation groups are a fundamental geometry of the filesystem; from
man mkfs.xfs.8:

   agcount=value
  This is used to specify  the  number  of  allocation
  groups.  

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Eric Sandeen
On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:

...

>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>
>> i.e. you are mounting an image to reproduce the problem, correct?
>> And the system is "smart" enough to fire off an email to a filesystem list;
>> if it does so, add a link to the image itself, as you already have already 
>> done
>> for the C reproducer.
>>
>> Filesystem images are common parlance for filesystem engineers.  When
>> you engage with them you'll have better results if you  provide them with
>> inputs they can use directly instead of requiring them to reverse-engineer
>> your custom test harness.
> 
> 
> Well, yes and no.
> syz_mount_image() is the only part of a large system that knows about
> images. For the rest of the system it's just a syscall like any other
> syscall. And the part that sends emails is far away from
> syz_mount_image().
> syzbot does not know per se that it sends an email to filesystems
> list.

I am asking it to learn this trick as an enhancement.  The MAINTAINERS
file contains big clues about which subsystems are filesystems, for example:

$ grep FILESYSTEM$ MAINTAINERS 
AFS FILESYSTEM
CRAMFS FILESYSTEM
EFI VARIABLE FILESYSTEM
EFS FILESYSTEM
FREEVXFS FILESYSTEM
...

> It just extracted kernel source file name that looked relevant
> to this crash and run get_maintainers.pl on it.
> Also the image can contain dynamically generated data, which makes it
> impossible to have as a file at all.

I guess I'm not sure what this means, can you explain?

> Thinking of this, what should be reasonably easy to do and may be a
> compromise for near future is the following. We could insert code into
> syz_mount_image() which dump the image if you build a program with a
> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?

If this is possible, I guess I still don't understand why you can't dump the
image and provide link.  You have fast, efficient robots.  We have slow,
busy humans.

>>> Please elaborate re commits. It's a basic rule of any good bug report
>>> -- communicate exact state of source code when the bug was hit, i.e.
>>> provide the commit hash.
>>
>> Further best practice is to provide the /correct/ commit hash.



>> I can't imagine these are right...
> 
> 
> As I said, bisection is on our plate:
> https://github.com/google/syzkaller/issues/501
> Though, we will see how well it works, because it's not trivial (see
> the issue for details).

Oh I see. I had misunderstood; so:

> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> Merge branch 'ras-core-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

doesn't mean you bisected to that commit, or that this was the first bad commit,
it just means you happened to have a tree at this commit when you hit the 
problem.

That was not at all clear to me.  I thought when syzkaller was telling us
"on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
I'm not sure if anyone else made that mistake, but  perhaps you could also 
clarify
the bug report text in this regard?

...

>> I think that in this case, what we are asking for is a fine tuning of the
>> testing and reporting so that we can more efficiently address these issues.
>> Off the top of my head, and there may be more items:
>>
>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>tbh.)
> 
> There is a human contact at syzkal...@googlegroups.com. Report footer
> will be improved to make it more clear:
> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
> 
>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look 
>> like
>>a recent regression, you can state that too.
> 
> Bisection is on our plate.
> 
>> 3) If you're reporting a filesystem bug that arose from using a filesystem
>>image, provide a URL to that filesystem image directly in the report.
> 
> See above. It may not be necessary representable as a static file at all.

Can you explain this?  Do you mean that the mounted image is changing while the
tool runs, while the filesystem is mounted?
 
>> 4) Create a filesystem image that can be more easily debugged by the experts,
>>i.e. one with > 1 allocation group, so standard repair & analysis tools 
>> can
>>be used with it.
> 
> What is "> 1 allocation group"?

Maybe I should back up; how are the xfs images created?  I had assumed that
surely you start with a base image of some sort, and start fuzzing it from 
there.
Is this correct?

If so, allocation groups are a fundamental geometry of the filesystem; from
man mkfs.xfs.8:

   agcount=value
  This is used to specify  the  number  of  allocation
  groups.  The  data  section  

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov  wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>>> Merge branch 'ras-core-for-linus' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> syzbot dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

Have anybody looked at the bug and the image yet?


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov  wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>>> Merge branch 'ras-core-for-linus' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> syzbot dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

Have anybody looked at the bug and the image yet?


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:
> On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
>> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  
>> wrote:
>>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
 On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>>> Merge branch 'ras-core-for-linus' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> syzbot dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>
>>> C reproducer: 
>>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> (took me about a minute to extract from test by replacing memfd_create
> with open and running the program).

 Says the expert who knows exactly how it's all put together. It took
 me a couple of hours just to understand WTF the syzbot reproducer
 was actually doing
>
> *nod* more on this below.
>
> Then do the following to trigger the bug:
> losetup /dev/loop0 xfs.repro
> mkdir xfs
> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>
> To answer your more general question: syzbot is not a system to test
> solely file systems, it finds bugs in hundreds of kernel subsystems.
> Generating image for file systems, media files for sound and
> FaceDancer programs that crash host when  FaceDancer device is plugged
> into USB is not feasible. And in the end it's not even clear what

 I don't care how syzbot generates the filesystem image it uses.

> kernel subsystem is at fault and even if it somehow figures out that
> it's a filesystem, it's unclear that it's exactly an image that
> provokes the bug. syzbot provides C reproducers which is a reasonable

 It doesn't matter *what subsystem breaks*. If syzbot is generating a
 filesystem image and then mounting it, it needs to provide that
 filesystem image to to people who end up having to debug the
 problem. It's a basic "corrupt filesystem" bug triage step.
>
> *nod*
>
> Some bugs are so involved that only an
> expert in a particular subsystem can figure out what happens there.

 And that's clearly the case here, whether you like it or not.

 You want us to do things that make syzbot more useful as a tool but
 you don't want to do things that make syzbot a useful tool for us.
 It's a two way street
>>
>> Hi Dave, Darrick,
>>
>> It's not that we don't want to make the system more useful, it's just
>> that we don't see what reasonably can be done here. The system does
>> not have notion of images, sound files, USB devices. It feeds data
>> into system calls, and that's what it provides as reproducers. Also
>> see the last paragraph.
>
> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>
> i.e. you are mounting an image to reproduce the problem, correct?
> And the system is "smart" enough to fire off an email to a filesystem list;
> if it does so, add a link to the image itself, as you already have already 
> done
> for the C reproducer.
>
> Filesystem images are common parlance for filesystem engineers.  When
> you engage with them you'll have better results if you  provide them with
> inputs they can use directly instead of requiring them to reverse-engineer
> your custom test harness.


Well, yes and no.
syz_mount_image() is the only part of a large system that knows about
images. For the rest of the system it's just a syscall like any other
syscall. And the part that sends emails is far away from
syz_mount_image().
syzbot does not know per se that it 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-30 Thread Dmitry Vyukov
On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen  wrote:
> On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
>> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  
>> wrote:
>>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
 On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>>> Merge branch 'ras-core-for-linus' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> syzbot dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>
>>> C reproducer: 
>>> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> (took me about a minute to extract from test by replacing memfd_create
> with open and running the program).

 Says the expert who knows exactly how it's all put together. It took
 me a couple of hours just to understand WTF the syzbot reproducer
 was actually doing
>
> *nod* more on this below.
>
> Then do the following to trigger the bug:
> losetup /dev/loop0 xfs.repro
> mkdir xfs
> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>
> To answer your more general question: syzbot is not a system to test
> solely file systems, it finds bugs in hundreds of kernel subsystems.
> Generating image for file systems, media files for sound and
> FaceDancer programs that crash host when  FaceDancer device is plugged
> into USB is not feasible. And in the end it's not even clear what

 I don't care how syzbot generates the filesystem image it uses.

> kernel subsystem is at fault and even if it somehow figures out that
> it's a filesystem, it's unclear that it's exactly an image that
> provokes the bug. syzbot provides C reproducers which is a reasonable

 It doesn't matter *what subsystem breaks*. If syzbot is generating a
 filesystem image and then mounting it, it needs to provide that
 filesystem image to to people who end up having to debug the
 problem. It's a basic "corrupt filesystem" bug triage step.
>
> *nod*
>
> Some bugs are so involved that only an
> expert in a particular subsystem can figure out what happens there.

 And that's clearly the case here, whether you like it or not.

 You want us to do things that make syzbot more useful as a tool but
 you don't want to do things that make syzbot a useful tool for us.
 It's a two way street
>>
>> Hi Dave, Darrick,
>>
>> It's not that we don't want to make the system more useful, it's just
>> that we don't see what reasonably can be done here. The system does
>> not have notion of images, sound files, USB devices. It feeds data
>> into system calls, and that's what it provides as reproducers. Also
>> see the last paragraph.
>
> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>
> i.e. you are mounting an image to reproduce the problem, correct?
> And the system is "smart" enough to fire off an email to a filesystem list;
> if it does so, add a link to the image itself, as you already have already 
> done
> for the C reproducer.
>
> Filesystem images are common parlance for filesystem engineers.  When
> you engage with them you'll have better results if you  provide them with
> inputs they can use directly instead of requiring them to reverse-engineer
> your custom test harness.


Well, yes and no.
syz_mount_image() is the only part of a large system that knows about
images. For the rest of the system it's just a syscall like any other
syscall. And the part that sends emails is far away from
syz_mount_image().
syzbot does not know per se that it sends an email to filesystems
list. It just extracted kernel 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-16 Thread Eric Sandeen
On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  
> wrote:
>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
 On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> syzkaller reproducer:
>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>
> What a mess. A hand built, hopelessly broken filesystem image made
> up of hex dumps, written into a mmap()d region of memory, then
> copied into a tmpfs file and mounted with the loop device.
>
> Engineers that can debug broken filesystems don't grow on trees.  If
> we are to have any hope of understanding what the hell this test is
> doing, the bot needs to supply us with a copy of the built
> filesystem image the test uses. We need to be able to point forensic
> tools at the image to decode all the structures into human readable
> format - if we are forced to do that by hand or jump through hoops
> to create our own filesystem image than I'm certainly not going to
> waste time looking at these reports...

 Hi Dave,

 Here is the image:
 https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
 (took me about a minute to extract from test by replacing memfd_create
 with open and running the program).
>>>
>>> Says the expert who knows exactly how it's all put together. It took
>>> me a couple of hours just to understand WTF the syzbot reproducer
>>> was actually doing

*nod* more on this below.

 Then do the following to trigger the bug:
 losetup /dev/loop0 xfs.repro
 mkdir xfs
 mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs

 To answer your more general question: syzbot is not a system to test
 solely file systems, it finds bugs in hundreds of kernel subsystems.
 Generating image for file systems, media files for sound and
 FaceDancer programs that crash host when  FaceDancer device is plugged
 into USB is not feasible. And in the end it's not even clear what
>>>
>>> I don't care how syzbot generates the filesystem image it uses.
>>>
 kernel subsystem is at fault and even if it somehow figures out that
 it's a filesystem, it's unclear that it's exactly an image that
 provokes the bug. syzbot provides C reproducers which is a reasonable
>>>
>>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>>> filesystem image and then mounting it, it needs to provide that
>>> filesystem image to to people who end up having to debug the
>>> problem. It's a basic "corrupt filesystem" bug triage step.

*nod*

 Some bugs are so involved that only an
 expert in a particular subsystem can figure out what happens there.
>>>
>>> And that's clearly the case here, whether you like it or not.
>>>
>>> You want us to do things that make syzbot more useful as a tool but
>>> you don't want to do things that make syzbot a useful tool for us.
>>> It's a two way street
> 
> Hi Dave, Darrick,
> 
> It's not that we don't want to make the system more useful, it's just
> that we don't see what reasonably can be done here. The system does
> not have notion of images, sound files, USB devices. It feeds data
> into system calls, and that's what it provides as reproducers. Also
> see the last paragraph.

It sure /seems/ to have a notion of images: what else is syz_mount_image()?

i.e. you are mounting an image to reproduce the problem, correct?
And the system is "smart" enough to fire off an email to a filesystem list;
if it does so, add a link to the image itself, as you already have already done
for the C reproducer.

Filesystem images are common parlance for filesystem engineers.  When
you engage with them you'll have better results if you  provide them with
inputs they can use directly instead of requiring them to reverse-engineer
your custom test harness.

>> ...here's my take on this whole situation:
>>
>> So far as I can tell, this syzbot daemon grew the ability to fuzz
>> filesystems and started emitting bug report after bug report, with
>> misleading commit ids that have nothing to do with the complaint.
> 
> Please elaborate re commits. It's a basic rule of any good bug report
> -- communicate exact state of source code when the bug was hit, i.e.
> provide the commit hash.

Further best practice is to 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-16 Thread Eric Sandeen
On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  
> wrote:
>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
 On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> syzkaller reproducer:
>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>
> What a mess. A hand built, hopelessly broken filesystem image made
> up of hex dumps, written into a mmap()d region of memory, then
> copied into a tmpfs file and mounted with the loop device.
>
> Engineers that can debug broken filesystems don't grow on trees.  If
> we are to have any hope of understanding what the hell this test is
> doing, the bot needs to supply us with a copy of the built
> filesystem image the test uses. We need to be able to point forensic
> tools at the image to decode all the structures into human readable
> format - if we are forced to do that by hand or jump through hoops
> to create our own filesystem image than I'm certainly not going to
> waste time looking at these reports...

 Hi Dave,

 Here is the image:
 https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
 (took me about a minute to extract from test by replacing memfd_create
 with open and running the program).
>>>
>>> Says the expert who knows exactly how it's all put together. It took
>>> me a couple of hours just to understand WTF the syzbot reproducer
>>> was actually doing

*nod* more on this below.

 Then do the following to trigger the bug:
 losetup /dev/loop0 xfs.repro
 mkdir xfs
 mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs

 To answer your more general question: syzbot is not a system to test
 solely file systems, it finds bugs in hundreds of kernel subsystems.
 Generating image for file systems, media files for sound and
 FaceDancer programs that crash host when  FaceDancer device is plugged
 into USB is not feasible. And in the end it's not even clear what
>>>
>>> I don't care how syzbot generates the filesystem image it uses.
>>>
 kernel subsystem is at fault and even if it somehow figures out that
 it's a filesystem, it's unclear that it's exactly an image that
 provokes the bug. syzbot provides C reproducers which is a reasonable
>>>
>>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>>> filesystem image and then mounting it, it needs to provide that
>>> filesystem image to to people who end up having to debug the
>>> problem. It's a basic "corrupt filesystem" bug triage step.

*nod*

 Some bugs are so involved that only an
 expert in a particular subsystem can figure out what happens there.
>>>
>>> And that's clearly the case here, whether you like it or not.
>>>
>>> You want us to do things that make syzbot more useful as a tool but
>>> you don't want to do things that make syzbot a useful tool for us.
>>> It's a two way street
> 
> Hi Dave, Darrick,
> 
> It's not that we don't want to make the system more useful, it's just
> that we don't see what reasonably can be done here. The system does
> not have notion of images, sound files, USB devices. It feeds data
> into system calls, and that's what it provides as reproducers. Also
> see the last paragraph.

It sure /seems/ to have a notion of images: what else is syz_mount_image()?

i.e. you are mounting an image to reproduce the problem, correct?
And the system is "smart" enough to fire off an email to a filesystem list;
if it does so, add a link to the image itself, as you already have already done
for the C reproducer.

Filesystem images are common parlance for filesystem engineers.  When
you engage with them you'll have better results if you  provide them with
inputs they can use directly instead of requiring them to reverse-engineer
your custom test harness.

>> ...here's my take on this whole situation:
>>
>> So far as I can tell, this syzbot daemon grew the ability to fuzz
>> filesystems and started emitting bug report after bug report, with
>> misleading commit ids that have nothing to do with the complaint.
> 
> Please elaborate re commits. It's a basic rule of any good bug report
> -- communicate exact state of source code when the bug was hit, i.e.
> provide the commit hash.

Further best practice is to provide the /correct/ commit hash.

syzbot has 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-13 Thread Dmitry Vyukov
On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  wrote:
> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
>> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> > >> Hello,
>> > >>
>> > >> syzbot hit the following crash on upstream commit
>> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> > >> Merge branch 'ras-core-for-linus' of
>> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > >> syzbot dashboard link:
>> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> > >>
>> > >> C reproducer: 
>> > >> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> > >> syzkaller reproducer:
>> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> > >
>> > > What a mess. A hand built, hopelessly broken filesystem image made
>> > > up of hex dumps, written into a mmap()d region of memory, then
>> > > copied into a tmpfs file and mounted with the loop device.
>> > >
>> > > Engineers that can debug broken filesystems don't grow on trees.  If
>> > > we are to have any hope of understanding what the hell this test is
>> > > doing, the bot needs to supply us with a copy of the built
>> > > filesystem image the test uses. We need to be able to point forensic
>> > > tools at the image to decode all the structures into human readable
>> > > format - if we are forced to do that by hand or jump through hoops
>> > > to create our own filesystem image than I'm certainly not going to
>> > > waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>> > (took me about a minute to extract from test by replacing memfd_create
>> > with open and running the program).
>>
>> Says the expert who knows exactly how it's all put together. It took
>> me a couple of hours just to understand WTF the syzbot reproducer
>> was actually doing
>>
>> > Then do the following to trigger the bug:
>> > losetup /dev/loop0 xfs.repro
>> > mkdir xfs
>> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>> >
>> > To answer your more general question: syzbot is not a system to test
>> > solely file systems, it finds bugs in hundreds of kernel subsystems.
>> > Generating image for file systems, media files for sound and
>> > FaceDancer programs that crash host when  FaceDancer device is plugged
>> > into USB is not feasible. And in the end it's not even clear what
>>
>> I don't care how syzbot generates the filesystem image it uses.
>>
>> > kernel subsystem is at fault and even if it somehow figures out that
>> > it's a filesystem, it's unclear that it's exactly an image that
>> > provokes the bug. syzbot provides C reproducers which is a reasonable
>>
>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>> filesystem image and then mounting it, it needs to provide that
>> filesystem image to to people who end up having to debug the
>> problem. It's a basic "corrupt filesystem" bug triage step.
>>
>> > Some bugs are so involved that only an
>> > expert in a particular subsystem can figure out what happens there.
>>
>> And that's clearly the case here, whether you like it or not.
>>
>> You want us to do things that make syzbot more useful as a tool but
>> you don't want to do things that make syzbot a useful tool for us.
>> It's a two way street

Hi Dave, Darrick,

It's not that we don't want to make the system more useful, it's just
that we don't see what reasonably can be done here. The system does
not have notion of images, sound files, USB devices. It feeds data
into system calls, and that's what it provides as reproducers. Also
see the last paragraph.

> ...here's my take on this whole situation:
>
> So far as I can tell, this syzbot daemon grew the ability to fuzz
> filesystems and started emitting bug report after bug report, with
> misleading commit ids that have nothing to do with the complaint.

Please elaborate re commits. It's a basic rule of any good bug report
-- communicate exact state of source code when the bug was hit, i.e.
provide the commit hash.

>  Your
> maintainers (Dave, Eric, and me) have spent a few hours trying to figure
> out what's going on, to some frustration.  The bug reports themselves
> miss the public engagement detail of saying something like "Hey XFS
> engineers, if you'd like to talk to the syzbot engineers they can be
> reached at ."  Instead it merely says "direct
> questions to " like this is some press release and the only thing
> on the other end of the line will be some disinterested bureaucracy.
> Or some robot.

This is quite subjective and we hear opinions all possible ways. I
don't think there is one size fits all. E.g. +Ted argued in exactly
the opposite direction: make reports more laconic, 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-13 Thread Dmitry Vyukov
On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong  wrote:
> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
>> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
>> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> > >> Hello,
>> > >>
>> > >> syzbot hit the following crash on upstream commit
>> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> > >> Merge branch 'ras-core-for-linus' of
>> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > >> syzbot dashboard link:
>> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> > >>
>> > >> C reproducer: 
>> > >> https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> > >> syzkaller reproducer:
>> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> > >
>> > > What a mess. A hand built, hopelessly broken filesystem image made
>> > > up of hex dumps, written into a mmap()d region of memory, then
>> > > copied into a tmpfs file and mounted with the loop device.
>> > >
>> > > Engineers that can debug broken filesystems don't grow on trees.  If
>> > > we are to have any hope of understanding what the hell this test is
>> > > doing, the bot needs to supply us with a copy of the built
>> > > filesystem image the test uses. We need to be able to point forensic
>> > > tools at the image to decode all the structures into human readable
>> > > format - if we are forced to do that by hand or jump through hoops
>> > > to create our own filesystem image than I'm certainly not going to
>> > > waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>> > (took me about a minute to extract from test by replacing memfd_create
>> > with open and running the program).
>>
>> Says the expert who knows exactly how it's all put together. It took
>> me a couple of hours just to understand WTF the syzbot reproducer
>> was actually doing
>>
>> > Then do the following to trigger the bug:
>> > losetup /dev/loop0 xfs.repro
>> > mkdir xfs
>> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>> >
>> > To answer your more general question: syzbot is not a system to test
>> > solely file systems, it finds bugs in hundreds of kernel subsystems.
>> > Generating image for file systems, media files for sound and
>> > FaceDancer programs that crash host when  FaceDancer device is plugged
>> > into USB is not feasible. And in the end it's not even clear what
>>
>> I don't care how syzbot generates the filesystem image it uses.
>>
>> > kernel subsystem is at fault and even if it somehow figures out that
>> > it's a filesystem, it's unclear that it's exactly an image that
>> > provokes the bug. syzbot provides C reproducers which is a reasonable
>>
>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>> filesystem image and then mounting it, it needs to provide that
>> filesystem image to to people who end up having to debug the
>> problem. It's a basic "corrupt filesystem" bug triage step.
>>
>> > Some bugs are so involved that only an
>> > expert in a particular subsystem can figure out what happens there.
>>
>> And that's clearly the case here, whether you like it or not.
>>
>> You want us to do things that make syzbot more useful as a tool but
>> you don't want to do things that make syzbot a useful tool for us.
>> It's a two way street

Hi Dave, Darrick,

It's not that we don't want to make the system more useful, it's just
that we don't see what reasonably can be done here. The system does
not have notion of images, sound files, USB devices. It feeds data
into system calls, and that's what it provides as reproducers. Also
see the last paragraph.

> ...here's my take on this whole situation:
>
> So far as I can tell, this syzbot daemon grew the ability to fuzz
> filesystems and started emitting bug report after bug report, with
> misleading commit ids that have nothing to do with the complaint.

Please elaborate re commits. It's a basic rule of any good bug report
-- communicate exact state of source code when the bug was hit, i.e.
provide the commit hash.

>  Your
> maintainers (Dave, Eric, and me) have spent a few hours trying to figure
> out what's going on, to some frustration.  The bug reports themselves
> miss the public engagement detail of saying something like "Hey XFS
> engineers, if you'd like to talk to the syzbot engineers they can be
> reached at ."  Instead it merely says "direct
> questions to " like this is some press release and the only thing
> on the other end of the line will be some disinterested bureaucracy.
> Or some robot.

This is quite subjective and we hear opinions all possible ways. I
don't think there is one size fits all. E.g. +Ted argued in exactly
the opposite direction: make reports more laconic, formal,
table-formatted with dry information. There 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-06 Thread Darrick J. Wong
On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> > >> Hello,
> > >>
> > >> syzbot hit the following crash on upstream commit
> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> > >> Merge branch 'ras-core-for-linus' of
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> syzbot dashboard link:
> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >>
> > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> syzkaller reproducer:
> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >
> > > What a mess. A hand built, hopelessly broken filesystem image made
> > > up of hex dumps, written into a mmap()d region of memory, then
> > > copied into a tmpfs file and mounted with the loop device.
> > >
> > > Engineers that can debug broken filesystems don't grow on trees.  If
> > > we are to have any hope of understanding what the hell this test is
> > > doing, the bot needs to supply us with a copy of the built
> > > filesystem image the test uses. We need to be able to point forensic
> > > tools at the image to decode all the structures into human readable
> > > format - if we are forced to do that by hand or jump through hoops
> > > to create our own filesystem image than I'm certainly not going to
> > > waste time looking at these reports...
> > 
> > Hi Dave,
> > 
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > (took me about a minute to extract from test by replacing memfd_create
> > with open and running the program).
> 
> Says the expert who knows exactly how it's all put together. It took
> me a couple of hours just to understand WTF the syzbot reproducer
> was actually doing
> 
> > Then do the following to trigger the bug:
> > losetup /dev/loop0 xfs.repro
> > mkdir xfs
> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> > 
> > To answer your more general question: syzbot is not a system to test
> > solely file systems, it finds bugs in hundreds of kernel subsystems.
> > Generating image for file systems, media files for sound and
> > FaceDancer programs that crash host when  FaceDancer device is plugged
> > into USB is not feasible. And in the end it's not even clear what
> 
> I don't care how syzbot generates the filesystem image it uses.
> 
> > kernel subsystem is at fault and even if it somehow figures out that
> > it's a filesystem, it's unclear that it's exactly an image that
> > provokes the bug. syzbot provides C reproducers which is a reasonable
> 
> It doesn't matter *what subsystem breaks*. If syzbot is generating a
> filesystem image and then mounting it, it needs to provide that
> filesystem image to to people who end up having to debug the
> problem. It's a basic "corrupt filesystem" bug triage step.
> 
> > Some bugs are so involved that only an
> > expert in a particular subsystem can figure out what happens there.
> 
> And that's clearly the case here, whether you like it or not.
> 
> You want us to do things that make syzbot more useful as a tool but
> you don't want to do things that make syzbot a useful tool for us.
> It's a two way street

...here's my take on this whole situation:

So far as I can tell, this syzbot daemon grew the ability to fuzz
filesystems and started emitting bug report after bug report, with
misleading commit ids that have nothing to do with the complaint.  Your
maintainers (Dave, Eric, and me) have spent a few hours trying to figure
out what's going on, to some frustration.  The bug reports themselves
miss the public engagement detail of saying something like "Hey XFS
engineers, if you'd like to talk to the syzbot engineers they can be
reached at ."  Instead it merely says "direct
questions to " like this is some press release and the only thing
on the other end of the line will be some disinterested bureaucracy.
Or some robot.

We're willing to take random user reports of corruption and misbehavior
and do all the work to turn that into regression tests and patches, but
that's not the situation here.  You work for a well known engineering
company which (I assume) has decided that fuzz hardening the commons
aligns with its goals.  Collective-you (i.e. your company) could realize
that goal sooner and with a lot less community friction by staffing up
engineers to join our community to help us triage and fix the things
reported by syzbot.  It's much /less/ effective to dump a pile of work
on the people in the community.  We each have our own work-piles and
most likely different priorities.

Hardening XFS to the sorts of things fuzzers find is important to me
(and $employer) as well, but the difference here is that I read every
report that gets generated and start the 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-06 Thread Darrick J. Wong
On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> > >> Hello,
> > >>
> > >> syzbot hit the following crash on upstream commit
> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> > >> Merge branch 'ras-core-for-linus' of
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> syzbot dashboard link:
> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >>
> > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> syzkaller reproducer:
> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >
> > > What a mess. A hand built, hopelessly broken filesystem image made
> > > up of hex dumps, written into a mmap()d region of memory, then
> > > copied into a tmpfs file and mounted with the loop device.
> > >
> > > Engineers that can debug broken filesystems don't grow on trees.  If
> > > we are to have any hope of understanding what the hell this test is
> > > doing, the bot needs to supply us with a copy of the built
> > > filesystem image the test uses. We need to be able to point forensic
> > > tools at the image to decode all the structures into human readable
> > > format - if we are forced to do that by hand or jump through hoops
> > > to create our own filesystem image than I'm certainly not going to
> > > waste time looking at these reports...
> > 
> > Hi Dave,
> > 
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > (took me about a minute to extract from test by replacing memfd_create
> > with open and running the program).
> 
> Says the expert who knows exactly how it's all put together. It took
> me a couple of hours just to understand WTF the syzbot reproducer
> was actually doing
> 
> > Then do the following to trigger the bug:
> > losetup /dev/loop0 xfs.repro
> > mkdir xfs
> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> > 
> > To answer your more general question: syzbot is not a system to test
> > solely file systems, it finds bugs in hundreds of kernel subsystems.
> > Generating image for file systems, media files for sound and
> > FaceDancer programs that crash host when  FaceDancer device is plugged
> > into USB is not feasible. And in the end it's not even clear what
> 
> I don't care how syzbot generates the filesystem image it uses.
> 
> > kernel subsystem is at fault and even if it somehow figures out that
> > it's a filesystem, it's unclear that it's exactly an image that
> > provokes the bug. syzbot provides C reproducers which is a reasonable
> 
> It doesn't matter *what subsystem breaks*. If syzbot is generating a
> filesystem image and then mounting it, it needs to provide that
> filesystem image to to people who end up having to debug the
> problem. It's a basic "corrupt filesystem" bug triage step.
> 
> > Some bugs are so involved that only an
> > expert in a particular subsystem can figure out what happens there.
> 
> And that's clearly the case here, whether you like it or not.
> 
> You want us to do things that make syzbot more useful as a tool but
> you don't want to do things that make syzbot a useful tool for us.
> It's a two way street

...here's my take on this whole situation:

So far as I can tell, this syzbot daemon grew the ability to fuzz
filesystems and started emitting bug report after bug report, with
misleading commit ids that have nothing to do with the complaint.  Your
maintainers (Dave, Eric, and me) have spent a few hours trying to figure
out what's going on, to some frustration.  The bug reports themselves
miss the public engagement detail of saying something like "Hey XFS
engineers, if you'd like to talk to the syzbot engineers they can be
reached at ."  Instead it merely says "direct
questions to " like this is some press release and the only thing
on the other end of the line will be some disinterested bureaucracy.
Or some robot.

We're willing to take random user reports of corruption and misbehavior
and do all the work to turn that into regression tests and patches, but
that's not the situation here.  You work for a well known engineering
company which (I assume) has decided that fuzz hardening the commons
aligns with its goals.  Collective-you (i.e. your company) could realize
that goal sooner and with a lot less community friction by staffing up
engineers to join our community to help us triage and fix the things
reported by syzbot.  It's much /less/ effective to dump a pile of work
on the people in the community.  We each have our own work-piles and
most likely different priorities.

Hardening XFS to the sorts of things fuzzers find is important to me
(and $employer) as well, but the difference here is that I read every
report that gets generated and start the work to figure out a 

Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-05 Thread Dave Chinner
On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on upstream commit
> >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> >> Merge branch 'ras-core-for-linus' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> syzbot dashboard link:
> >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>
> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> syzkaller reproducer:
> >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >
> > What a mess. A hand built, hopelessly broken filesystem image made
> > up of hex dumps, written into a mmap()d region of memory, then
> > copied into a tmpfs file and mounted with the loop device.
> >
> > Engineers that can debug broken filesystems don't grow on trees.  If
> > we are to have any hope of understanding what the hell this test is
> > doing, the bot needs to supply us with a copy of the built
> > filesystem image the test uses. We need to be able to point forensic
> > tools at the image to decode all the structures into human readable
> > format - if we are forced to do that by hand or jump through hoops
> > to create our own filesystem image than I'm certainly not going to
> > waste time looking at these reports...
> 
> Hi Dave,
> 
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> (took me about a minute to extract from test by replacing memfd_create
> with open and running the program).

Says the expert who knows exactly how it's all put together. It took
me a couple of hours just to understand WTF the syzbot reproducer
was actually doing

> Then do the following to trigger the bug:
> losetup /dev/loop0 xfs.repro
> mkdir xfs
> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> 
> To answer your more general question: syzbot is not a system to test
> solely file systems, it finds bugs in hundreds of kernel subsystems.
> Generating image for file systems, media files for sound and
> FaceDancer programs that crash host when  FaceDancer device is plugged
> into USB is not feasible. And in the end it's not even clear what

I don't care how syzbot generates the filesystem image it uses.

> kernel subsystem is at fault and even if it somehow figures out that
> it's a filesystem, it's unclear that it's exactly an image that
> provokes the bug. syzbot provides C reproducers which is a reasonable

It doesn't matter *what subsystem breaks*. If syzbot is generating a
filesystem image and then mounting it, it needs to provide that
filesystem image to to people who end up having to debug the
problem. It's a basic "corrupt filesystem" bug triage step.

> Some bugs are so involved that only an
> expert in a particular subsystem can figure out what happens there.

And that's clearly the case here, whether you like it or not.

You want us to do things that make syzbot more useful as a tool but
you don't want to do things that make syzbot a useful tool for us.
It's a two way street

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-05 Thread Dave Chinner
On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on upstream commit
> >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> >> Merge branch 'ras-core-for-linus' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> syzbot dashboard link:
> >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>
> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> syzkaller reproducer:
> >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >
> > What a mess. A hand built, hopelessly broken filesystem image made
> > up of hex dumps, written into a mmap()d region of memory, then
> > copied into a tmpfs file and mounted with the loop device.
> >
> > Engineers that can debug broken filesystems don't grow on trees.  If
> > we are to have any hope of understanding what the hell this test is
> > doing, the bot needs to supply us with a copy of the built
> > filesystem image the test uses. We need to be able to point forensic
> > tools at the image to decode all the structures into human readable
> > format - if we are forced to do that by hand or jump through hoops
> > to create our own filesystem image than I'm certainly not going to
> > waste time looking at these reports...
> 
> Hi Dave,
> 
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> (took me about a minute to extract from test by replacing memfd_create
> with open and running the program).

Says the expert who knows exactly how it's all put together. It took
me a couple of hours just to understand WTF the syzbot reproducer
was actually doing

> Then do the following to trigger the bug:
> losetup /dev/loop0 xfs.repro
> mkdir xfs
> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> 
> To answer your more general question: syzbot is not a system to test
> solely file systems, it finds bugs in hundreds of kernel subsystems.
> Generating image for file systems, media files for sound and
> FaceDancer programs that crash host when  FaceDancer device is plugged
> into USB is not feasible. And in the end it's not even clear what

I don't care how syzbot generates the filesystem image it uses.

> kernel subsystem is at fault and even if it somehow figures out that
> it's a filesystem, it's unclear that it's exactly an image that
> provokes the bug. syzbot provides C reproducers which is a reasonable

It doesn't matter *what subsystem breaks*. If syzbot is generating a
filesystem image and then mounting it, it needs to provide that
filesystem image to to people who end up having to debug the
problem. It's a basic "corrupt filesystem" bug triage step.

> Some bugs are so involved that only an
> expert in a particular subsystem can figure out what happens there.

And that's clearly the case here, whether you like it or not.

You want us to do things that make syzbot more useful as a tool but
you don't want to do things that make syzbot a useful tool for us.
It's a two way street

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-05 Thread Dmitry Vyukov
On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> syzkaller reproducer:
>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>
> What a mess. A hand built, hopelessly broken filesystem image made
> up of hex dumps, written into a mmap()d region of memory, then
> copied into a tmpfs file and mounted with the loop device.
>
> Engineers that can debug broken filesystems don't grow on trees.  If
> we are to have any hope of understanding what the hell this test is
> doing, the bot needs to supply us with a copy of the built
> filesystem image the test uses. We need to be able to point forensic
> tools at the image to decode all the structures into human readable
> format - if we are forced to do that by hand or jump through hoops
> to create our own filesystem image than I'm certainly not going to
> waste time looking at these reports...

Hi Dave,

Here is the image:
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
(took me about a minute to extract from test by replacing memfd_create
with open and running the program).
Then do the following to trigger the bug:
losetup /dev/loop0 xfs.repro
mkdir xfs
mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs

To answer your more general question: syzbot is not a system to test
solely file systems, it finds bugs in hundreds of kernel subsystems.
Generating image for file systems, media files for sound and
FaceDancer programs that crash host when  FaceDancer device is plugged
into USB is not feasible. And in the end it's not even clear what
kernel subsystem is at fault and even if it somehow figures out that
it's a filesystem, it's unclear that it's exactly an image that
provokes the bug. syzbot provides C reproducers which is a reasonable
common ground for bug reports. At this point the bug needs human
attention. Some bugs are trivial enough that a developer does not even
need to look at the reproducer. Some bugs are so involved that only an
expert in a particular subsystem can figure out what happens there.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-05 Thread Dmitry Vyukov
On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner  wrote:
> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
>> Merge branch 'ras-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> syzkaller reproducer:
>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>
> What a mess. A hand built, hopelessly broken filesystem image made
> up of hex dumps, written into a mmap()d region of memory, then
> copied into a tmpfs file and mounted with the loop device.
>
> Engineers that can debug broken filesystems don't grow on trees.  If
> we are to have any hope of understanding what the hell this test is
> doing, the bot needs to supply us with a copy of the built
> filesystem image the test uses. We need to be able to point forensic
> tools at the image to decode all the structures into human readable
> format - if we are forced to do that by hand or jump through hoops
> to create our own filesystem image than I'm certainly not going to
> waste time looking at these reports...

Hi Dave,

Here is the image:
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
(took me about a minute to extract from test by replacing memfd_create
with open and running the program).
Then do the following to trigger the bug:
losetup /dev/loop0 xfs.repro
mkdir xfs
mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs

To answer your more general question: syzbot is not a system to test
solely file systems, it finds bugs in hundreds of kernel subsystems.
Generating image for file systems, media files for sound and
FaceDancer programs that crash host when  FaceDancer device is plugged
into USB is not feasible. And in the end it's not even clear what
kernel subsystem is at fault and even if it somehow figures out that
it's a filesystem, it's unclear that it's exactly an image that
provokes the bug. syzbot provides C reproducers which is a reasonable
common ground for bug reports. At this point the bug needs human
attention. Some bugs are trivial enough that a developer does not even
need to look at the reproducer. Some bugs are so involved that only an
expert in a particular subsystem can figure out what happens there.


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-02 Thread Dave Chinner
On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> Merge branch 'ras-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> 
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048

What a mess. A hand built, hopelessly broken filesystem image made
up of hex dumps, written into a mmap()d region of memory, then
copied into a tmpfs file and mounted with the loop device.

Engineers that can debug broken filesystems don't grow on trees.  If
we are to have any hope of understanding what the hell this test is
doing, the bot needs to supply us with a copy of the built
filesystem image the test uses. We need to be able to point forensic
tools at the image to decode all the structures into human readable
format - if we are forced to do that by hand or jump through hoops
to create our own filesystem image than I'm certainly not going to
waste time looking at these reports...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: WARNING: bad unlock balance in xfs_iunlock

2018-04-02 Thread Dave Chinner
On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +)
> Merge branch 'ras-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> 
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048

What a mess. A hand built, hopelessly broken filesystem image made
up of hex dumps, written into a mmap()d region of memory, then
copied into a tmpfs file and mounted with the loop device.

Engineers that can debug broken filesystems don't grow on trees.  If
we are to have any hope of understanding what the hell this test is
doing, the bot needs to supply us with a copy of the built
filesystem image the test uses. We need to be able to point forensic
tools at the image to decode all the structures into human readable
format - if we are forced to do that by hand or jump through hoops
to create our own filesystem image than I'm certainly not going to
waste time looking at these reports...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com