On 17.07.2013, at 14:52, Mark Cave-Ayland wrote: > On 17/07/13 09:16, Kevin Wolf wrote: > > Hi Kevin, > > Thanks for the reply - CC to qemu-devel as requested. > >>> I've been testing some of Alex Graf's patches for running Darwin >>> under QEMU PPC and have been experiencing some timeout problems on >>> block devices. My attention is drawn to this commit in particular: >>> https://github.com/qemu/qemu/commit/80fc95d8bdaf3392106b131a97ca701fd374489a. >>> >>> The reason for this commit is that Darwin programs the DBDMA >>> controller to transfer data from the ATA FIFO in chunks that aren't >>> sector aligned, e.g. the ATA command requests 0x10000 (256 sectors) >>> but transfers the DMA engine to transfer the data to memory as 3 >>> chunks of 0xfffe, 0xfffe and 0x4 bytes. >> >> I'm not familiar with how DMA works for the macio IDE device. Do you >> have any pointers to specs or something? > > It works by setting up a DMA descriptor table (which is a list of commands) > which are then "executed" when the RUN status bit is set until a STOP command > is reached. Things are slightly more complicated in that commands can have > conditional branches set on them. > >> The one important point I'm wondering about is why you call >> dma_bdrv_read() with a single 0xfffe QEMUSGList. Shouldn't it really be >> called with a QEMUSGList { 0xfffe, 0xfffe, 0x4 }, which should enable >> dma-helpers.c to do the right thing? > > Hmmm I guess you could perhaps scan down the command list from the current > position looking for all INPUT/OUTPUT commands until the next STOP command, > and maybe build up a single QEMUSGList from that? I'm not sure exactly how > robust that would be with the conditional branching though - Alex?
It'd at least be vastly different from how real hardware works, yes. We'd basically have to throw away the current interpretation code and instead emulate the device based on assumptions. > >> In any case, it would be good if you could prepare a (yet failing) qtest >> case that demonstrates how DMA works on this controller and what the >> problematic requests look like. > > I'll see what I can do, however I've not really looked at qtest before so it > could take a little time. In the meantime, you can easily see these transfers > by booting an old Darwin installation ISO. > >>> It seems that the DMA API dma_bdrv_read()/dma_bdrv_write() can't >>> handle unaligned transfers in this way, yet I think there is a >>> better solution for this that doesn't mix DMA/non-DMA APIs in this >>> manner. I'd like to try and come up with a better solution, but >>> there seems to be a mix of synchronous/asynchronous/co-routine block >>> APIs that could be used. >>> >>> So my question is: what do you think is the best way to approach >>> solving the unaligned data access for MACIO using a DMA-friendly >>> API? >> >> First, as I said above, I'd like to understand why you need to go with >> unaligned values into the DMA API. Real hardware also only works on a >> sector level. > > The main culprit for these transfers is Darwin which limits large transfers > to 0xfffe (see http://searchcode.com/codesearch/view/23337208 line 382). > Hence most large disk transactions get broken down into irregularly-sized > chunks which highlights this issue. The main issue is that we're dealing with 3 separate pieces of hardware here. There is the IDE controller which works on sector level. And then there's the DMA controller which fetches data from the IDE controller byte-wise (from what I understand). Both work independently, but we try to shoehorn both into the same callback. > >> The block layer is really designed for working with whole sectors. The >> only functions dealing with byte-aligned requests are bdrv_pread/pwrite. >> They are synchronous (i.e. block the vcpu while running) and writes are >> slow (because they first read the whole sector, copy in the modified >> part, and write out the whole sector again), so you want to avoid it. > > Yeah, I figured this wasn't the most efficient way of doing it. The reason > for asking the question was that I'm still struggling with some kind of > timing/threading issue with Alex's work here, and I'm wondering if making the > unaligned DMA requests more "atomic" by not mixing DMA/non-DMA/synchronous > APIs will help solve the issue or not. I don't think it'll make a difference. But I'd be more than happy to design this more properly too - the current code is vastly ugly. Alex