2010/6/21 Artyom Tarasenko <atar4q...@googlemail.com>: > 2010/5/25 Blue Swirl <blauwir...@gmail.com>: >>>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU. >>> >>> What does indicate it? It happens where the disk sizes are normally >>> reported, so it could be a scsi/dma/irq/fpu issue as well. >> >> IIRC the DVMA address was 0xfc004000, but the mapped entries were for >> 0xfc000000 to 0xfc003fff.
Under OpenBIOS. And even less with OBP, and much less if the network card is disabled. > It looks like we have multiple problems here: they start with > 0xfc004000 access (which can theoretically be expected on the real > hardware too) as you pointed out, but what happens afterwards is > strange too: > > - In the current qemu implementation we have a screaming NMI which > NetBSD can not clear. This happens cause NMI in qemu is literally > non-maskable, while on the real hardware it can be masked with the > 'mask all' flag. I'll send a patch for it. > > - with the masking patch, the NMI is not screaming but still is > percepted as spurious. This may be ok if NetBSD (1.6-3.1) doesn't have > a moduleerr_handler set. Or because scsi dma transfer on a real hardware never generates a nmi. In the current implementation, when "select with attention" is processed, scsi controller initiates a dma transfer and fetches a CDB. If dma fails (not mapped, or not allowed), NMI is generated. It is quite a strange design: such an error is an asynchronous event, and CPU wouldn't know, that scsi controller tried to do some dma at certain address. It would have been more consequent to send the error notification to the dma initiator (scsi controller in this case), not to CPU. The offending code in NetBSD 1.6-3.1: NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); // Here it crashes (under qemu) cause dma page is not valid NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); // The page would have been made valid here. NCRDMA_GO(sc); In the working versions (before 1.6 and after 4.0) the code looks like this: NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); //... NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); NCRDMA_GO(sc); After debugging the code on the real hardware, it looks like qemu has multiple problems in scsi/dma/iommu layer. I modified NCRDMA_SETUP, so that it did dma transfer without mapping the page. In this case NetBSD 3.1 shows the following error (on a real SS-20): dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED> esp0: DMA error; resetting dma0: error: csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED> no NMI. And what is more important, on the real hardware "select with attention" does not initiate dma (put a delay, waited 2 seconds and nothing happened). It has to be done manually. Any suggestions how to fix it according to the current iommu/dma architecture? Looks like "select with attention" should register callbacks? ( Volunteers? ;-) ) -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/