On 01/12/20 16:00, P J P wrote:
* I was thinking about checking 'elementary_transfer_size' against
'byte_count_limit', but that did not work out. The loop is confusing there,
it first sets elementary_size = size and subtracts the same
Yes, that part is correct.
* 's->lba == -1' did not strike me as wrong, because code explicitly checks it
and gets past it. It does not flag an error when 's->lba == -1'.
The command spec only matters to some extent (it matters more in writing
a fix than in understanding the bug).
The questions to ask yourself after reading the code are:
1) ide_atapi_cmd_reply_end does contain an attempt at resetting an
out-of-bounds io_buffer_index to 0. Why is the reproducer bypassing it?
The answer must be because s->lba == -1.
2) what it means for ide_atapi_cmd_reply_end treats s->lba == -1
differently. The answer is that s->lba == -1 means the command is not a
read (according to the code) but in this case you get there with a read.
* I tested the patch with a reproducer and it helped to fix the crash.
Yes, but fixing the crash is not enough. You need to ensure that the
code makes sense. You need to distinguish between violated
preconditions and the consequences of those violations. Once a
precondition is violated, all bets are off on what happens in the code
below it. So if you don't catch the violated precondition at the
_first_ place where it can happen, you can have other issues elsewhere.
So again the question to ask is, how do you get s->lba == -1 in read
context? Where did things go wrong? Are there other read paths that
can set s->lba == -1? This is where reading the spec (as opposed to the
code) starts to be important.
* My doubt about the patch was that it prematurely ends the while loop ie.
before s->packet_transfer_size reaches zero(0), there may be possible data
loss.
Right, catching the problem above means that you can just raise an ATAPI
command error.
| 3) We also need to ensure that the bug will not happen again. Did you get a
| reproducer from the reporter? If not, how did you trust the report to be
| correct? If so, did you try to include it in the QEMU qtest testsuite?
* I did test it against a reproducer, but did not get to the qtest part for
the time constraints.
qtests are not just helpful. Adding regression tests for bugs is a
*basic* software engineering principle. If you don't have time to write
tests, you (or your organization) should find it.
But even if you don't write tests you need at least to enclose the
reproducer, otherwise you're posting a puzzle not a patch. :)
Paolo