On 2018-10-25 [email protected] wrote:
> From: Bhargava Shastry <[email protected]>

The discussion started in private email based on [1] but different
patches got sent on the list about the same time, so to avoid making
this look weird to other people on xz-devel, let's keep further
discussion on xz-devel.

[1]
https://github.com/pdknsk/oss-fuzz/commit/6265894b4ca049895eaa522ca3964b78ff8a52ef

I commented [1] in a private email and was mostly happy with what I
saw. But the first patch sent on xz-devel had completely different
fuzz.c that was based on an example program from the xz package.
Compared to [1] there were some things I didn't like and the second
patch on xz-devel tries to fix them.

Outside these patches, an essential thing to do is to add #ifdefs to
disable header CRC32 verification in liblzma when
FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION has been #defined. I will do
that.

Hopefully the above is a correct enough summary of what has happened so
far. :-)

> This patch addresses the following comments by Lasse Collin:
>   - Disable CRC check and add memory limit during decoder
>     initialization
>   - Check for LZMA_PROG_ERROR
>   - Remove superflous comments
>   - Use the flag "FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION" provided
>     by oss-fuzz infra to disable debug prints but retain them for
>     offline (non-fuzzing debug) use.

This is better than the first version. :-) I comment the details below.

> +    // Decode BUFSIZ==8192 bytes of inbuf at a time

The value of BUFSIZ depends on libc and isn't always 8192.

> +        // TODO: We invoke lzma_code twice when remainlen == 0.
> +        // Is this okay?
> +
> +        if (strm->avail_in == 0 && remainlen != 0) {
> +            strm->next_in = inbuf;
> +            strm->avail_in = (remainlen > BUFSIZ) ? BUFSIZ :
> remainlen;
> +            remainlen -= strm->avail_in;
> +
> +            if (remainlen == 0)
> +                action = LZMA_FINISH;
> +        }

The initial value of remainlen cannot be 0 because the parent function
handles zero-length input specially and doesn't call this function at
all. If initial remainlen == 0 were possible, the above code would be
broken because it would never set action to LZMA_FINISH. Removing the
check for 0 from the parent function and handling it here would be good.

I guess the reason to split the input into BUFSIZ chunks is there to
emulate reading from a file, but to me that doesn't seem very useful
from point of view of fuzzing. Splitting the input into smaller chunks
is good in sense that it tests that liblzma behaves correctly when
liblzma doesn't get the whole file at once. However, the BUFSIZ is far
too big to do this for header decoding. For that, it could be good to
even feed one byte at a time to get better testing coverage. The same
can apply to output buffering. One-byte buffers might slow down fuzzing
though.

> +        lzma_ret ret = lzma_code(strm, action);
> +        /* LZMA_PROG_ERROR should be rarely, if ever, happen
> +         * The assertion codifies this expectation.
> +         */
> +        assert(ret != LZMA_PROG_ERROR);

It should *never* happen if the code calling the liblzma functions is
correct. It is better to use if() instead of assert() here to catch
LZMA_PROG_ERROR. It's not good to rely on that assertion checks haven't
been disabled with #define NDEBUG.

> +        // TODO: Is this code trying to overwrite outbuf when outlen
> +        // is exhausted? If so, is that okay?
> +        if (strm->avail_out == 0 || ret == LZMA_STREAM_END) {
> +            strm->next_out = outbuf;
> +            strm->avail_out = outlen;
> +        }

Since the contents of outbuf aren't needed for anything, it's fine to
reuse it this way.

> +    // Init decoder.
> +    if (!init_decoder(&strm)) {
> +        // Decoder initialization failed. There's no point
> +        // retrying, so bail out.
> +        return 0;
> +    }

Returning 0 indicates success, I guess. If initialization fails, the
actual fuzzing is never done. While it's unlikely that the
initialization fails, it's still essential to report the error to the
caller instead of silently skipping the whole fuzzing step.

> +    uint8_t outbuf[BUFSIZ];
> +
> +    if (!decompress(&strm, data, size, outbuf, BUFSIZ)) {
> +#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
> +        fprintf(stderr, "Decode failure\n");
> +#endif
> +    }

The value of BUFSIZ depends on the libc and is meant to be a decent
buffer size for file operations, but the fuzzer doesn't work on files.
I suggest using a fixed value that doesn't depend on libc.

Overall I'm still unsure if this version is better than the 37-line
fuzz.c in [1]. Some issues affect both versions, but overall [1] is
more straightforward code (but it also tries to do less). I think we
need to know what features are wanted and then see what kind of code
does exactly that. Specifically:

  - Is it enough to pass the whole input to liblzma at once? If not,
    how small chunks to use? One byte could give most thorough fuzzing
    but is slower too.

  - Use a big (a few kilobytes) output buffer or a tiny one, like only
    one byte? It's the same argument as for the input buffer.

  - Reusing the same lzma_stream structure between calls would test the
    re-initialization code, but I understand it's problematic because
    then results of a round of fuzzing round may depend on what
    happened on previous rounds, so I suppose this is not an option.
    (Reusing would be done by making strm static and omitting the
    lzma_end() call.)

A final small note: In build.sh in [1], it's worth considering to
use --enable-debug instead of --disable-debug. --enable-debug enables
assert() checks which might help in finding bugs.

Thanks for your help!

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to