On Tue, Nov 07, 2023 at 07:14:07PM +0100, Kevin Wolf wrote:
> Am 05.10.2023 um 12:04 hat Niklas Cassel geschrieben:
> > From: Niklas Cassel <niklas.cas...@wdc.com>
> > 
> > Legacy software contains a standard mechanism for generating a reset to a
> > Serial ATA device - setting the SRST (software reset) bit in the Device
> > Control register.
> > 
> > Serial ATA has a more robust mechanism called COMRESET, also referred to
> > as port reset. A port reset is the preferred mechanism for error
> > recovery and should be used in place of software reset.
> > 
> > Commit e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > improved the handling of PxCI, such that PxCI gets cleared after handling
> > a non-NCQ, or NCQ command (instead of incorrectly clearing PxCI after
> > receiving an arbitrary FIS).
> > 
> > However, simply clearing PxCI after a non-NCQ, or NCQ command, is not
> > enough, we also need to clear PxCI when receiving a SRST in the Device
> > Control register.
> > 
> > This fixes an issue for FreeBSD where the device would fail to reset.
> > The problem was not noticed in Linux, because Linux uses a COMRESET
> > instead of a legacy software reset by default.
> > 
> > Fixes: e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document PxCI handling")
> > Reported-by: Marcin Juszkiewicz <marcin.juszkiew...@linaro.org>
> > Signed-off-by: Niklas Cassel <niklas.cas...@wdc.com>
> > ---
> > Changes since v1: write the D2H FIS before clearing PxCI.
> > 
> >  hw/ide/ahci.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> > index babdd7b458..7269dabbdb 100644
> > --- a/hw/ide/ahci.c
> > +++ b/hw/ide/ahci.c
> > @@ -1254,10 +1254,26 @@ static void handle_reg_h2d_fis(AHCIState *s, int 
> > port,
> >          case STATE_RUN:
> >              if (cmd_fis[15] & ATA_SRST) {
> >                  s->dev[port].port_state = STATE_RESET;
> > +                /*
> > +                 * When setting SRST in the first H2D FIS in the reset 
> > sequence,
> > +                 * the device does not send a D2H FIS. Host software thus 
> > has to
> > +                 * set the "Clear Busy upon R_OK" bit such that PxCI (and 
> > BUSY)
> > +                 * gets cleared. See AHCI 1.3.1, section 10.4.1 Software 
> > Reset.
> > +                 */
> > +                if (opts & AHCI_CMD_CLR_BUSY) {
> > +                    ahci_clear_cmd_issue(ad, slot);
> > +                }
> 
> I suspect that AHCI_CMD_CLR_BUSY really should be checked in a more
> generic place, but this will do for fixing software reset.

This weirdo option "Clear Busy upon R_OK" is for the HBA itself to clear
the bit for the command in PxCI, when it gets the R_OK that the Host to Device
(H2D) FIS was transmitted successfully to the device.
The normal way is that the device sends back a Device to Host (D2H) FIS
which then causes the bit in PxCI to be cleared in the HBA.

Yes, theoretically you could probably build a FIS that has the
"Clear Busy upon R_OK" even for e.g. a regular non-NCQ I/O command
(where PxCI/BUSY is first supposed to be cleared once the command is done...),
however this would surely cause problems as PxCI would then no longer shadow
the state of the device.

If you search for "Clear Busy upon R_OK", the only usage seems
to be for legacy software reset.

I'm quite sure that the option was specifically created for legacy
software reset, since to do a legacy software reset, you need to
create and issue two FISes.
The first FIS has SRST set (assert reset), the second FIS has SRST
cleared (deassert reset).

For regular I/O non-NCQ commands, PxCI will be cleared when the
I/O has completed (by the device sending back a D2H FIS).
For NCQ commands, PxCI will be cleared when the drive has
successfully queued the command (by the device sending back a D2H FIS).

The spec states that there should NOT be a D2H FIS sent back to
the HBA when getting the first H2D FIS (which asserts reset).
(The device will still send back a R_OK that the FIS was transmitted
successfully.)

For the second H2D FIS (that deasserts reset), the spec says that the
device SHOULD send a D2H FIS back (after the device diagnostic is done).


> 
> >              }
> >              break;
> >          case STATE_RESET:
> >              if (!(cmd_fis[15] & ATA_SRST)) {
> > +                /*
> > +                 * When clearing SRST in the second H2D FIS in the reset
> > +                 * sequence, the device will send a D2H FIS. See SATA 3.5a 
> > Gold,
> > +                 * section 11.4 Software reset protocol.
> > +                 */
> > +                ahci_clear_cmd_issue(ad, slot);
> > +                ahci_write_fis_d2h(ad, false);
> >                  ahci_reset_port(s, port);
> 
> This part isn't mentioned in the commit message at all, and I don't see
> how it's related to commit e2a5d9b3d9c3 either. Is this supposed to be a
> bonus fix?

There are two ways to reset.
COMRESET which is a simple register write.
or
legacy software reset, where you create and issue two H2D FISes
(just like when submitting an I/O command).
For legacy software reset, the first H2D FIS should not send back a
D2H FIS. The second H2D FIS should.

ahci_reset_port() is used by both ways.



Previously, the QEMU model didn't handle a legacy software according
to spec, it simply cleared PxCI blindly without checking "Clear Busy
upon R_OK" for non-I/O commands.


In SATA 3.5a Gold, section 11.4 Software reset protocol.
When SRST goes from 1->0, it should execute a device diagnostic.
At the end of the diagnostic, it should transfer a D2H FIS with good
status.


Reading other chapters again, the execution of device diagnostic,
and sending of good status, is also done when a reset is done via
a COMRESET.

So even while a D2H FIS is usually sent back as a response to a H2D FIS,
it appears that also a COMRESET (which is a register write), triggers
a D2H FIS.


I did know that the device sends a D2H FIS on power up, in order to update
PxSIG. (This is why we have the ahci_init_d2h() function.)

However, it turns out that the D2H FIS that updates PxSIG and the D2H FIS
that is send out after a SRST 1->0 (and a COMRESET), is the same as the
D2H FIS that updates PxSIG.

But if we remove:

> > +                ahci_clear_cmd_issue(ad, slot);
> > +                ahci_write_fis_d2h(ad, false);

Then PxCI for the second H2D FIS (that is executed like) a normal command
will not get cleared here...


So where should it get cleared?
In ahci_init_d2h() is the correct place... ahci_init_d2h() does
ahci_write_fis_d2h(), but is currently the only ahci_write_fis_d2h()
that doesn't also do a ahci_clear_cmd_issue().


However, simply adding a ahci_clear_cmd_issue() in ahci_init_d2h()
is not trivial. The whole ahci_init_d2h() is a big hack, as the comments
in the code already says... And ahci_init_d2h() is not only called from
ahci_reset_port()...

I think I would prefer to keep the ahci_clear_cmd_issue() above,
drop the ahci_write_fis_d2h(ad, false), and modify the comment to
say that the matching D2H FIS is really the ahci_write_fis_d2h()
done by ahci_init_d2h()... which is the _singular_ D2H FIS sent out
after the device diagnostic triggered by SRST 1->0 has completed,
which send the good status (which is what the HBA initializes PxSIG
to), and additionally clears PxCI.

(For a reset triggered by a COMRESET, everything is the same, except
it doesn't clear PxCI, since there was no command created and issued.)


> ahci_reset_port() already calls ahci_init_d2h() -> ahci_write_fis_d2h().
> So I think this new ahci_write_fis_d2h() only sets some state that will
> immediately be overwritten again. Which is good, because we didn't set
> the signature as described in the SATA software reset protocol yet, that
> is only done in ahci_reset_port().
> 
> Am I missing something? Why do we need this ahci_write_fis_d2h() call
> here?
> 
> As for the ahci_clear_cmd_issue(), I'm surprised that ahci_reset_port()
> doesn't already clear the register. Wouldn't it make more sense there
> than just in this one caller?

A start/stop of the engine (PxCMD.ST) does clear PxCI, as I implemented in:
https://github.com/qemu/qemu/commit/d73b84d0b664e60fffb66f46e84d0db4a8e1c713

But according to AHCI 1.3.1, section:
3.3.14 Offset 38h: PxCI – Port x Command Issue

PxCI does also have a reset value of 0h.

So I would assume that it is okay to clear PxCI to zero also in
ahci_reset_port().

By doing so, we could avoid both the ahci_clear_cmd_issue(),
and the ahci_write_fis_d2h(ad, false).

Perhaps you can drop the "bonus" part of my patch and simply
add a pr->cmd_issue to ahci_reset_port()?

Thank you for your review!


Kind regards,
Niklas

Reply via email to