Dear Lasse,

Firstly, thank you so much for your valuable feedback. I am new to the
xz code base and your insights are really useful :-)

Secondly, I would like to agree with you that the original fuzzer
harness was not only simple but pretty much generates the same test
coverage as the version in this thread. So, it makes sense to revert to
the original patch.

I will do this and send an updated patch on this mailing list
incorporating the following feedback:


On 10/25/18 11:34 PM, Lasse Collin wrote:
>> +    // Decode BUFSIZ==8192 bytes of inbuf at a time
> 
> The value of BUFSIZ depends on libc and isn't always 8192.

Agreed. What would you like it to be? Afaiu, we are doing not only
header decoding but also contents of the compressed block itself, right?

As a reference, fuzzers usually generate between 1 byte and a few
hundred KB of compressed data.

Test harnesses for other compression libs such as lzo [1] use an output
buffer size equal to default block size (= 256 KB).

[1]:
https://github.com/bshastry/oss-fuzz/blob/000d9c3e1f1018431028471c250f52003b82697c/projects/lzo/lzo_decompress_target.c#L52

> 
>> +        // TODO: We invoke lzma_code twice when remainlen == 0.
>> +        // Is this okay?
>> +
>> +        if (strm->avail_in == 0 && remainlen != 0) {
>> +            strm->next_in = inbuf;
>> +            strm->avail_in = (remainlen > BUFSIZ) ? BUFSIZ :
>> remainlen;
>> +            remainlen -= strm->avail_in;
>> +
>> +            if (remainlen == 0)
>> +                action = LZMA_FINISH;
>> +        }
> 
> The initial value of remainlen cannot be 0 because the parent function
> handles zero-length input specially and doesn't call this function at
> all. If initial remainlen == 0 were possible, the above code would be
> broken because it would never set action to LZMA_FINISH. Removing the
> check for 0 from the parent function and handling it here would be good.

I feel this comment won't be relevant when I revert to the original
version of the harness, so skipping discussion (at least for now)

> I guess the reason to split the input into BUFSIZ chunks is there to
> emulate reading from a file, but to me that doesn't seem very useful
> from point of view of fuzzing. Splitting the input into smaller chunks
> is good in sense that it tests that liblzma behaves correctly when
> liblzma doesn't get the whole file at once. However, the BUFSIZ is far
> too big to do this for header decoding.

It is my understanding that the harness is doing more than just header
decoding, right? I notice (in the generated html coverage report) that
calls to "block_decode" are also executed while fuzzing.

Sig

 For that, it could be good to
> even feed one byte at a time to get better testing coverage. The same
> can apply to output buffering. One-byte buffers might slow down fuzzing
> though.
> 
>> +        lzma_ret ret = lzma_code(strm, action);
>> +        /* LZMA_PROG_ERROR should be rarely, if ever, happen
>> +         * The assertion codifies this expectation.
>> +         */
>> +        assert(ret != LZMA_PROG_ERROR);
> 
> It should *never* happen if the code calling the liblzma functions is
> correct. It is better to use if() instead of assert() here to catch
> LZMA_PROG_ERROR. It's not good to rely on that assertion checks haven't
> been disabled with #define NDEBUG.

The reason I chose assert() was because fuzzers pick up on signals such
as SIGABRT, SIGSEGV etc. Also, libFuzzer interface reserves non-zero
return codes, the LLVMFuzzerTestOneInput() API is required to always
return 0.

Would replacing "assert" with an "abort" make sense? This way, we don't
need to use a debug build (which may slow down fuzzing).

>> +        // TODO: Is this code trying to overwrite outbuf when outlen
>> +        // is exhausted? If so, is that okay?
>> +        if (strm->avail_out == 0 || ret == LZMA_STREAM_END) {
>> +            strm->next_out = outbuf;
>> +            strm->avail_out = outlen;
>> +        }
> 
> Since the contents of outbuf aren't needed for anything, it's fine to
> reuse it this way.

ack.

>> +    // Init decoder.
>> +    if (!init_decoder(&strm)) {
>> +        // Decoder initialization failed. There's no point
>> +        // retrying, so bail out.
>> +        return 0;
>> +    }
> 
> Returning 0 indicates success, I guess. If initialization fails, the
> actual fuzzing is never done. While it's unlikely that the
> initialization fails, it's still essential to report the error to the
> caller instead of silently skipping the whole fuzzing step.

Would a call to "abort" make sense here?

>> +    uint8_t outbuf[BUFSIZ];
>> +
>> +    if (!decompress(&strm, data, size, outbuf, BUFSIZ)) {
>> +#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
>> +        fprintf(stderr, "Decode failure\n");
>> +#endif
>> +    }
> 
> The value of BUFSIZ depends on the libc and is meant to be a decent
> buffer size for file operations, but the fuzzer doesn't work on files.
> I suggest using a fixed value that doesn't depend on libc.
> 
> Overall I'm still unsure if this version is better than the 37-line
> fuzz.c in [1]. Some issues affect both versions, but overall [1] is
> more straightforward code (but it also tries to do less). I think we
> need to know what features are wanted and then see what kind of code
> does exactly that. Specifically:
> 
>   - Is it enough to pass the whole input to liblzma at once? If not,
>     how small chunks to use? One byte could give most thorough fuzzing
>     but is slower too.

One suggestion is to keep the current output buffer size I guess.

>   - Use a big (a few kilobytes) output buffer or a tiny one, like only
>     one byte? It's the same argument as for the input buffer.
> 
>   - Reusing the same lzma_stream structure between calls would test the
>     re-initialization code, but I understand it's problematic because
>     then results of a round of fuzzing round may depend on what
>     happened on previous rounds, so I suppose this is not an option.
>     (Reusing would be done by making strm static and omitting the
>     lzma_end() call.)
> 
> A final small note: In build.sh in [1], it's worth considering to
> use --enable-debug instead of --disable-debug. --enable-debug enables
> assert() checks which might help in finding bugs.

ack.

Thank you,
Bhargava

Reply via email to