NetBSD 9.0 IPfilter MSS clamp regression
Hello IPfilter on NetBSD 9.0 seems to have issues. On all i386 XEN3PAE_DOMU machines where I use filtering, it crashes (see http://mail-index.netbsd.org/tech-kern/2020/04/18/msg026280.html) Now I have a problem with MSS clamp. /etc/ipf.conf contains pass in from any to any pass out from any to any And /etc/ipnat.conf map xennet0 172.16.0.0/25 -> 0/0 mssclamp 512 Here is what happend when the local machine sends a DNS request: 03:40:01.561169 IP truncated-ip - 3 bytes missing! 192.0.2.14.65439 > 192.0.2.20.53: 15689+[|domain] Depending on the request length, the packet is truncated of 1 to 3 bytes. If I disable ipfilter, everything goes back to normal. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Ipfilter truncates outgoing packets (was: Re: ipfilter crash in latest netbsd-9)
Emmanuel Dreyfus wrote: > After upgrading to 9.0, I experienced crashes when enabling > ipfilter (backtrace below). I tried latest netbsd-9 kernel without > improvement. NB: this is on i386 XEN3PAE_DOMU Another problem: even with no rule loaded (empty ipf.conf and ipnat.conf), ipfilter corrupts outgoing packets: 03:40:01.561169 IP truncated-ip - 3 bytes missing! 192.0.2.14.65439 > 192.0.2.20.53: 15689+[|domain] Depending on the original length, the packet is truncated of 1 to 3 bytes. The behavior vanishes as soon as I run ipf -D Anyone experienced something similar? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: Am I using bus_dma right?
> Let me try to simplify these concepts. Thank you; that would help significantly. >> I'm not doing read/write DMA. [...] > If you are not doing DMA you don't need to do any memory > synchronization (modulo SMP issues with other CPUs, but that's a > completely different topic.) Oh, I'm doing DMA. Just not read/write DMA. (Buffer descriptors are write-direction DMA only, data is read-direction DMA only.) > The problem is many modern CPUs have write-back caches which are not > shared by I/O devices. So when you do a read operation (from device > to CPU) you should: > 1) Do a BUS_DMASYNC_PREREAD to make sure there is no data in the > cache that may be written to DRAM during the I/O operation. > 2) Tell the hardware to do the read operation. > 3) When the transaction completes issue a BUS_DMASYNC_POSTREAD to > make sure the CPU sees the data in DRAM not stale data in the cache. Okay, here's the first problem. There is no clear "transaction completes". The card has a DMA engine on it (a PLX9080, on the off chance you've run into it before) that can DMA into chained buffers. I set it up with a ring of butters - a chain of buffers with the last buffer pointing to the first, none of them with the "end of chain" bit set - and tell it to go. I request an interrupt at completion of each buffer, so I have a buffer-granularity idea of where it's at, modulo interrupt servicing latency. This means that there is no clear "this transfer has completed" moment. What I want to do is inspect the DMA buffer to see how far it's been overwritten, since there is a data value I know cannot be generated by the hardware that's feeding samples to the card (over half the data pins are hardwired to known logic levels). I've been treating it as though my inspection of a given sample in the buffer counts as "transfer completed" for purposes of that sample. > When you do a write operation you should: > 1) Make sure the buffer contains all the data you want to transmit. > 2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in > the CPU writeback cache is flushed to memory. > 3) Tell the hardware to do the write operation. > 4) When the write operation completes... well it shouldn't matter. ...but, according to the 8.0 manpage, I should do a POSTWRITE anyway, and going under the hood (this is all on amd64), I find that PREREAD is a no-op and POSTWRITE might matter because it issues an mfence to avoid memory access reordering issues. > If you have a ring buffer you should try to map it CONSISTENT which > will disable all caching of that memory. CONSISTENT? I don't find that anywhere; do you mean COHERENT? > However, some CPUs will not allow you to disable caching, so you > should put in the appropriate bus_dmamap_sync() operations so the > code will not break on those machines. For my immediate needs, I don't care about anything other than amd64. But I'd prefer to understand the paradigm properly for the benefit of potential future work. > When you set up the mapping for the ring buffer you should do either > a BUS_DMASYNC_PREREAD, or if you need to initialize some structures > in that buffer use BUS_DMASYNC_PREWRITE. One will do a cache > invalidate, the other one will force a writeback operation. I already PREWRITE the whole DMA-accessible area before telling the DMA engine to start. > When you get a device interrupt, you should do a BUS_DMAMEM_POSTREAD > to make sure anything that might have magically migrated into the > cache has been invalidated. There is no interrupt involved, in general. I request interrupts at buffer boundaries, but the buffers are very big compared to most DMAed blocks - a typical DMAed block is about 800 bytes, but the buffers are half a meg. > Then copy the data out of the ring buffer and do another > BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate. Then I think I was already doing everything necessary. And, indeed, I tried making the read routine do POSTREAD|POSTWRITE before and PREREAD|PREWRITE after its read-test-write of the samples, and it didn't help. >> One of the things that confuses me is that I have no write-direction >> DMA going on at all; all the DMA is in the read direction. But >> there is a driver write to the buffer that is, to put it loosely, >> half of a write DMA operation (the "host writes the buffer" half). > When the CPU updates the contents of the ring buffer it *is* a DMA > write, Well, maybe from bus_dma's point of view, but I would not say there is write-direction DMA happening unless something DMAs data out of memory. > even if the device never tries to read the contents, since the update > must be flushed from the cache to DRAM or you may end up reading > stale data later. So I have to treat it like a DMA write even if there is never any write-direction DMA actually going on? Then the problem *probably* is not bus_dma botchery. Someone else wrote me saying it was difficult to tell much without actually seeing the
Re: Am I using bus_dma right?
Let me try to simplify these concepts. On Thu, 23 Apr 2020, Mouse wrote: > I'm not doing read/write DMA. DMA never transfers from memory to the > device. (Well, I suppose it does to a small extent, in that the device > reads buffer descriptors. But the buffer descriptors are set up once > and never touched afterwards; the code snippet I posted is not writing > to them.) If you are not doing DMA you don't need to do any memory synchronization (modulo SMP issues with other CPUs, but that's a completely different topic.) > The hardware is DMAing into the memory, and nothing else. The driver > reads the memory and immediately writes it again, to be read by the > driver some later time, possibly being overwritten by DMA in between. > So an example that says "do write DMA" is not directly applicable. If a (non CPU) device is directly reading or writing DRAM without the CPU having to read a register and then write its contents to memory, then it is doing DMA. The problem is many modern CPUs have write-back caches which are not shared by I/O devices. So when you do a read operation (from device to CPU) you should: 1) Do a BUS_DMASYNC_PREREAD to make sure there is no data in the cache that may be written to DRAM during the I/O operation. 2) Tell the hardware to do the read operation. 3) When the transaction completes issue a BUS_DMASYNC_POSTREAD to make sure the CPU sees the data in DRAM not stale data in the cache. When you do a write operation you should: 1) Make sure the buffer contains all the data you want to transmit. 2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in the CPU writeback cache is flushed to memory. 3) Tell the hardware to do the write operation. 4) When the write operation completes... well it shouldn't matter. If you have a ring buffer you should try to map it CONSISTENT which will disable all caching of that memory. However, some CPUs will not allow you to disable caching, so you should put in the appropriate bus_dmamap_sync() operations so the code will not break on those machines. When you set up the mapping for the ring buffer you should do either a BUS_DMASYNC_PREREAD, or if you need to initialize some structures in that buffer use BUS_DMASYNC_PREWRITE. One will do a cache invalidate, the other one will force a writeback operation. When you get a device interrupt, you should do a BUS_DMAMEM_POSTREAD to make sure anything that might have magically migrated into the cache has been invalidated. Then copy the data out of the ring buffer and do another BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate. > The example makes it look as though read DMA (device->memory) needs to > be bracketed by PREREAD and POSTREAD and write DMA by PREWRITE and > POSTWRITE. If that were what I'm doing, it would be straightforward. > Instead, I have DMA and the driver both writing memory, but only the > driver ever reading it. > > Your placement for PREREAD and POSTREAD confuses me because it doesn't > match the example. The example says > > /* invalidate soon-to-be-stale cache blocks */ > bus_dmamap_sync(..., BUS_DMASYNC_PREREAD); > [ do read DMA ] > /* copy from bounce */ > bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD); > /* read data now in driver-provided buffer */ > [ computation ] > /* data to be written now in driver-provided buffer */ > /* flush write buffers and writeback, copy to bounce */ > bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE); > [ do write DMA ] > /* probably a no-op, but provided for consistency */ > bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE); > > but what your changes would have my driver doing is > > [read-direction DMA might happen here] > PREREAD > driver reads data from driver-provided buffer > POSTREAD > [read-direction DMA might happen here] > PREWRITE > driver writes data to driver-provided buffer > POSTWRITE > [read-direction DMA might happen here] That bit is not right. > The conceptual paradigm is > > - at attach time: allocate, set up, and load the mapping > Presumably you should do a BUS_DMASYNC_PREWRITE somewhere in here > - at open time: tell hardware to start DMAing and a BUS_DMASYNC_POSTWRUTE arond here. > Here you need a BUS_DMAMEM_POSTREAD. > - at read time (ie, repeatedly): driver reads buffer to see how much >has been overwritten by DMA, copying the overwritten portion out and >immediately resetting it to the pre-overwrite data, to be >overwritten again later If you wrote anything to the ring buffer during this operation you need to insert a BUS_DMASYNC_PREWRITE. > > - at close tiem: tell hardware to stop DMAing > > The map is never unloaded; the driver is not detachable. The system > has no use case for that, so I saw no point in putting time into it. > > The code I quoted is the "at read time" part. My guess based on the > manpage's example and what you've written is that I need > >
Re: Am I using bus_dma right?
>> while (fewer than n samples copied) >> DMASYNC_POSTREAD for sample at offset o > That should be PREREAD (to make sure the dma'd data is visible for > the cpu) >> read sample at offset o > and teh POSTREAD should be here >> if value is "impossible", break > missig PREWRITE here >> set sample at offset o to "impossible" value >> DMASYNC_PREWRITE for sample at offset o > and this should be POSTWRITE > See the example in the -current man page: This looks a lot like the example in the 8.0 manpage, which did not help much because my use case does not match its very well: > An example of using bus_dmamap_sync(), involving > multiple read-write use of a single mapping might look > like this: I'm not doing read/write DMA. DMA never transfers from memory to the device. (Well, I suppose it does to a small extent, in that the device reads buffer descriptors. But the buffer descriptors are set up once and never touched afterwards; the code snippet I posted is not writing to them.) The hardware is DMAing into the memory, and nothing else. The driver reads the memory and immediately writes it again, to be read by the driver some later time, possibly being overwritten by DMA in between. So an example that says "do write DMA" is not directly applicable. The example makes it look as though read DMA (device->memory) needs to be bracketed by PREREAD and POSTREAD and write DMA by PREWRITE and POSTWRITE. If that were what I'm doing, it would be straightforward. Instead, I have DMA and the driver both writing memory, but only the driver ever reading it. Your placement for PREREAD and POSTREAD confuses me because it doesn't match the example. The example says /* invalidate soon-to-be-stale cache blocks */ bus_dmamap_sync(..., BUS_DMASYNC_PREREAD); [ do read DMA ] /* copy from bounce */ bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD); /* read data now in driver-provided buffer */ [ computation ] /* data to be written now in driver-provided buffer */ /* flush write buffers and writeback, copy to bounce */ bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE); [ do write DMA ] /* probably a no-op, but provided for consistency */ bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE); but what your changes would have my driver doing is [read-direction DMA might happen here] PREREAD driver reads data from driver-provided buffer POSTREAD [read-direction DMA might happen here] PREWRITE driver writes data to driver-provided buffer POSTWRITE [read-direction DMA might happen here] The conceptual paradigm is - at attach time: allocate, set up, and load the mapping - at open time: tell hardware to start DMAing - at read time (ie, repeatedly): driver reads buffer to see how much has been overwritten by DMA, copying the overwritten portion out and immediately resetting it to the pre-overwrite data, to be overwritten again later - at close tiem: tell hardware to stop DMAing The map is never unloaded; the driver is not detachable. The system has no use case for that, so I saw no point in putting time into it. The code I quoted is the "at read time" part. My guess based on the manpage's example and what you've written is that I need while (fewer than n samples copied) POSTWRITE POSTREAD read sample from buffer if sample isn't "impossible" write "impossible" value to buffer PREWRITE PREREAD if sample is "impossible", break because some aspects of "write", and relatively normal "read", are happening outside that code segment. But this is different enough from what you said (and possibly not well-paired - should the PREWRITE be outside the if?) that now I'm possibly even less sure of myself. I could just try different permutations in the hope of finding something that works, but that strikes me as one of the worst possible ways to do it; I would prefer to understand the paradigm enough to get it right. I am not concerned about the race between pushing the driver-written value to the buffer and DMA overwriting it; provided the driver's write gets pushed reasonably promptly, this will happen only in error conditions like userland ignoring the device for too long - it takes the hardware multiple seconds to wrap around the ring buffer. > I always have to look up the direction, but READ is when CPU reads > data provided by the device. Yes: READ corresponds to read() and WRITE to write(). One of the things that confuses me is that I have no write-direction DMA going on at all; all the DMA is in the read direction. But there is a driver write to the buffer that is, to put it loosely, half of a write DMA operation (the "host
Re: Am I using bus_dma right?
On Wed, Apr 22, 2020 at 05:53:46PM -0400, Mouse wrote: > s = splhigh() > while (fewer than n samples copied) > DMASYNC_POSTREAD for sample at offset o That should be PREREAD (to make sure the dma'd data is visible for the cpu) > read sample at offset o and teh POSTREAD should be here > if value is "impossible", break missig PREWRITE here > set sample at offset o to "impossible" value > DMASYNC_PREWRITE for sample at offset o and this should be POSTWRITE > store sample in buffer[] > splx(s) > uiomove from buffer[] > if we found an "impossible" value, break; See the example in the -current man page: An example of using bus_dmamap_sync(), involving multiple read- write use of a single mapping might look like this: bus_dmamap_load(...); while (not done) { /* invalidate soon-to-be-stale cache blocks */ bus_dmamap_sync(..., BUS_DMASYNC_PREREAD); [ do read DMA ] /* copy from bounce */ bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD); /* read data now in driver-provided buffer */ [ computation ] /* data to be written now in driver-provided buffer */ /* flush write buffers and writeback, copy to bounce */ bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE); [ do write DMA ] /* probably a no-op, but provided for consistency */ bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE); } bus_dmamap_unload(...); I always have to look up the direction, but READ is when CPU reads data provided by the device. Martin