On 2018-10-25 [email protected] wrote: > From: Bhargava Shastry <[email protected]>
The discussion started in private email based on [1] but different patches got sent on the list about the same time, so to avoid making this look weird to other people on xz-devel, let's keep further discussion on xz-devel. [1] https://github.com/pdknsk/oss-fuzz/commit/6265894b4ca049895eaa522ca3964b78ff8a52ef I commented [1] in a private email and was mostly happy with what I saw. But the first patch sent on xz-devel had completely different fuzz.c that was based on an example program from the xz package. Compared to [1] there were some things I didn't like and the second patch on xz-devel tries to fix them. Outside these patches, an essential thing to do is to add #ifdefs to disable header CRC32 verification in liblzma when FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION has been #defined. I will do that. Hopefully the above is a correct enough summary of what has happened so far. :-) > This patch addresses the following comments by Lasse Collin: > - Disable CRC check and add memory limit during decoder > initialization > - Check for LZMA_PROG_ERROR > - Remove superflous comments > - Use the flag "FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION" provided > by oss-fuzz infra to disable debug prints but retain them for > offline (non-fuzzing debug) use. This is better than the first version. :-) I comment the details below. > + // Decode BUFSIZ==8192 bytes of inbuf at a time The value of BUFSIZ depends on libc and isn't always 8192. > + // TODO: We invoke lzma_code twice when remainlen == 0. > + // Is this okay? > + > + if (strm->avail_in == 0 && remainlen != 0) { > + strm->next_in = inbuf; > + strm->avail_in = (remainlen > BUFSIZ) ? BUFSIZ : > remainlen; > + remainlen -= strm->avail_in; > + > + if (remainlen == 0) > + action = LZMA_FINISH; > + } The initial value of remainlen cannot be 0 because the parent function handles zero-length input specially and doesn't call this function at all. If initial remainlen == 0 were possible, the above code would be broken because it would never set action to LZMA_FINISH. Removing the check for 0 from the parent function and handling it here would be good. I guess the reason to split the input into BUFSIZ chunks is there to emulate reading from a file, but to me that doesn't seem very useful from point of view of fuzzing. Splitting the input into smaller chunks is good in sense that it tests that liblzma behaves correctly when liblzma doesn't get the whole file at once. However, the BUFSIZ is far too big to do this for header decoding. For that, it could be good to even feed one byte at a time to get better testing coverage. The same can apply to output buffering. One-byte buffers might slow down fuzzing though. > + lzma_ret ret = lzma_code(strm, action); > + /* LZMA_PROG_ERROR should be rarely, if ever, happen > + * The assertion codifies this expectation. > + */ > + assert(ret != LZMA_PROG_ERROR); It should *never* happen if the code calling the liblzma functions is correct. It is better to use if() instead of assert() here to catch LZMA_PROG_ERROR. It's not good to rely on that assertion checks haven't been disabled with #define NDEBUG. > + // TODO: Is this code trying to overwrite outbuf when outlen > + // is exhausted? If so, is that okay? > + if (strm->avail_out == 0 || ret == LZMA_STREAM_END) { > + strm->next_out = outbuf; > + strm->avail_out = outlen; > + } Since the contents of outbuf aren't needed for anything, it's fine to reuse it this way. > + // Init decoder. > + if (!init_decoder(&strm)) { > + // Decoder initialization failed. There's no point > + // retrying, so bail out. > + return 0; > + } Returning 0 indicates success, I guess. If initialization fails, the actual fuzzing is never done. While it's unlikely that the initialization fails, it's still essential to report the error to the caller instead of silently skipping the whole fuzzing step. > + uint8_t outbuf[BUFSIZ]; > + > + if (!decompress(&strm, data, size, outbuf, BUFSIZ)) { > +#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION > + fprintf(stderr, "Decode failure\n"); > +#endif > + } The value of BUFSIZ depends on the libc and is meant to be a decent buffer size for file operations, but the fuzzer doesn't work on files. I suggest using a fixed value that doesn't depend on libc. Overall I'm still unsure if this version is better than the 37-line fuzz.c in [1]. Some issues affect both versions, but overall [1] is more straightforward code (but it also tries to do less). I think we need to know what features are wanted and then see what kind of code does exactly that. Specifically: - Is it enough to pass the whole input to liblzma at once? If not, how small chunks to use? One byte could give most thorough fuzzing but is slower too. - Use a big (a few kilobytes) output buffer or a tiny one, like only one byte? It's the same argument as for the input buffer. - Reusing the same lzma_stream structure between calls would test the re-initialization code, but I understand it's problematic because then results of a round of fuzzing round may depend on what happened on previous rounds, so I suppose this is not an option. (Reusing would be done by making strm static and omitting the lzma_end() call.) A final small note: In build.sh in [1], it's worth considering to use --enable-debug instead of --disable-debug. --enable-debug enables assert() checks which might help in finding bugs. Thanks for your help! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
