Re: Am I using bus_dma right?

2020-04-28 Thread Mouse
I wrote

> Okay, I've tried some experiments.  [...]

Found it.

Nothing at all to do with bus_dma.

The actual fix is

  bus_space_write_4(sc->lcr_t,sc->lcr_h,PLX9080_DMATHR,
( 7 << PLX9080_DMATHR_C0PLAF_S) |
( 7 << PLX9080_DMATHR_C0PLAE_S) |
( 4 << PLX9080_DMATHR_C0LPAE_S) |
-   (10 << PLX9080_DMATHR_C0LPAF_S) |
+   ( 1 << PLX9080_DMATHR_C0LPAF_S) |
( 4 << PLX9080_DMATHR_C1PLAF_S) |
( 4 << PLX9080_DMATHR_C1PLAE_S) |
( 4 << PLX9080_DMATHR_C1LPAE_S) |
-   ( 4 << PLX9080_DMATHR_C1LPAF_S) );
+   ( 1 << PLX9080_DMATHR_C1LPAF_S) );

That register is DMA thresholds: the 9080 doesn't even request the PCI
bus for DMA until it has at least that many samples pending.  (There
are two numbers changed because there are two channels; this
application doesn't use one of them, but rather than fiddle around with
that I just set it the same for both channels.)  Apparently it's
willing to sit on data indefinitely if the thresholds aren't reached.

My apologies to everyone for the noise and for (spuriously, albeit
implicitly) casting doubt on the bus_dma infrastructure - and my
thanks, again, to everyone who took time and brain cycles to help me
with this.

I just feel really really stupid that it's taken me multiple months to
remember that the hardware even _has_ that register and realize that it
could be relevant.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Am I using bus_dma right?

2020-04-28 Thread Mouse
Okay, I've tried some experiments.

Nothing seems to help. :(  In the latest version, I (a) made all the
PRE calls to bus_dmamap_sync use both PREREAD and PREWRITE, and all the
POST calls use both POSTREAD and POSTWRITE (by my reading of the amd64
implementation, this should not make any difference) and (b) added a
clflush of every written location after writing it, plus an mfence (one
mfence after each store or series of stores).

Still doesn't work.

I've made the latest version of the driver available in case anyone
wants to look at it.  It's on ftp.rodents-montreal.org, as before, in
/mouse/misc/7300/2020-04-28-1/7300a.c, 7300a.h, and 7300a-reg.h (the
other version I've moved into .../misc/7300/2020-04-23-1/).

Priorities have changed at work and I'm now on a different subtask of
this job.  I expect to be back on this one before long, though; I'm
going to be trying to rewrite the driver to use bus-dma more the way
most drivers do, in the hope that that works better (I _think_ I can
compensate for the differences in userland).  What I'm trying to do
here is admittedly an unusual sort of thing; the only thing like it I
can think of in a stock system is the way some audio devices support an
mmap()ped ring buffer, and the audio system is Byzantine enough I'm
having trouble finding where it deals with these issues - I don't find
any BUS_DMASYNC_ references at all, so I'm probably missing something.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Am I using bus_dma right?

2020-04-24 Thread Mouse
> You missed the most important part of my response:
>>> So I have to treat it like a DMA write even if there is never any
>>> write-direction DMA actually going on?
>> Yes.

Actually, I didn't miss that, though now that you mention it I can see
how it would seem like it, because I didn't overtly respond to it.  I
probably should have said something like "Got it!" there.

And, I think this too I didn't say, but thank you very much - you and
everyone who's contributed - for taking the time and effort to help me
with this!

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Am I using bus_dma right?

2020-04-24 Thread Eduardo Horvath


You missed the most important part of my response:

On Fri, 24 Apr 2020, Eduardo Horvath wrote:
> 
> > So I have to treat it like a DMA write even if there is never any
> > write-direction DMA actually going on?
> 
> Yes.
> 
> > Then the problem *probably* is not bus_dma botchery.

Eduardo


Re: Am I using bus_dma right?

2020-04-24 Thread Mouse
>> I've been treating it as though my inspection of a given sample in
>> the buffer counts as "transfer completed" for purposes of that
>> sample.
> Are you inspecting the buffer only after reciept of an interrupt or
> are you polling?

Polling.  (Polls are provoked by userland doing a read() that ends up
in my driver's read routine.)

> [...] POSTWRITE does tell the kernel it can free up any bounce
> buffers it may have allocated if it allocated bounce buffers, but I
> digress.

Someone else asked me (off-list) if bounce buffers were actually in
use.  I don't know; when next I'm back at that job (it's only three
days a week), one thing I intend to do is peek under the hood of the
bus_dma structures and find out.

>> For my immediate needs, I don't care about anything other than
>> amd64.  But I'd prefer to understand the paradigm properly for the
>> benefit of potential future work.
> I believe if you use COHERENT on amd64 none of this matters since it
> turns off caching on those memory regions.  (But I don't have time to
> grep the souces to verify this.)

I do - or, rather, I will.  I don't recall whether I'm using COHERENT,
but it's easy enough to add if I'm not.

>> And, indeed, I tried making the read routine do POSTREAD|POSTWRITE
>> before and PREREAD|PREWRITE after its read-test-write of the
>> samples, and it didn't help.
> Ah now we're getting to something interesting.

> What failure mode are you seeing?

That off-list person asked me that too.  I wrote up a more detailed
explanation, but I saved it in case someone else wanted it.  I'll
include the relevant text below.  The short summary is that I'm seeing
data get _severely_ delayed before reaching CPU visibility - severely
delayed as in multiple seconds.  And here's why I believe that's what's
going on:



Perhaps I should explain why I believe what I do about the behaviour.

The commercial product is a turnkey system involving some heavily
custom application-specific hardware.  It generates blobs of data which
historically it sent up to the host over the 7300 to a DOS program (the
version in use when I came into it was running under DOS and I was
brought in to help move it to something more modern).

Shortly into the project, we learned that the 7300 had been EOLed by
Adlink with no replacement device suggested.  We cast about and ended
up putting another small CPU on the generating end which sends data up
over Ethernet.  The only reason we still care about data over the 7300
is a relatively large installed base that doesn't have Ethernet-capable
data-generating hardware, but which we want to upgrade (the DOS
versions have various problems in addition to feature lack).

But my test hardware does have Ethernet.  And, in the presence of that
hardware, it always sends the data both ways, both as Ethernet packets
and as signals on differential pairs (which get converted to the
single-ended signals the 7300 needs very close to it - the differential
pairs are for better noise immunity over a relatively long cable run in
end-user installations).

For my test runs, I not only ran the application, which I told to read
from the 7300, but also a snoop program, which (a) uses BPF to capture
the Ethernet form of the data and (b) uses a snoop facility I added to
the 7300 driver to record a copy of everything that got returned
through a read() call.  I also added code to the userland application
to record everything it gets from read().  (The driver code I put up
for FTP does not have that.  I can make that version available too if
you want.)

What I'm seeing is an Ethernet packet arriving containing, let us say,
11 22 33 44 55 66 77 88 99 aa bb cc dd ee, but I'm also seeing the 7300
driver returning, say, 11 22 33 44 55 66, to userland, then userland
calling read() many times - enough to burn a full second of time -
getting "no data here" each time (see the next paragraph for what this
means).  Multiple seconds later, after userland has timed out and gone
into its "I'm not getting data" recovery mode, the driver sees the 77
88 99 aa bb cc dd ee part getting passed back to userland.

When userland calls read(), the driver reads the next sample out of the
DMA buffer, looking to see whether it's been overwritten.  If it has,
the samples it finds are passed back to userland and their places in
the buffer written over with a value that cannot correspond to a sample
(23 of the 32 data pins are grounded, so those bits cannot be nonzero
in a sample).  The driver uses interrupts only to deal with the case of
data arriving over the 7300 but userland not reading it.  The driver
wants to track where the hardware is DMAing into, so it knows where to
look for new data.  I configure the hardware to interrupt every
half-meg of data (in a 16M buffer); if the writing is getting too close
to the reading, I push the read point forward, clearing the buffer to
the "impossible" value in the process.  But, in the tests I'm doing, I
doubt that's happening (I can 

Re: Am I using bus_dma right?

2020-04-24 Thread Eduardo Horvath
On Thu, 23 Apr 2020, Mouse wrote:

> Okay, here's the first problem.  There is no clear "transaction
> completes".

Let's clarify that.

> The card has a DMA engine on it (a PLX9080, on the off chance you've
> run into it before) that can DMA into chained buffers.  I set it up
> with a ring of butters - a chain of buffers with the last buffer
> pointing to the first, none of them with the "end of chain" bit set -
> and tell it to go.  I request an interrupt at completion of each
> buffer, so I have a buffer-granularity idea of where it's at, modulo
> interrupt servicing latency.
> 
> This means that there is no clear "this transfer has completed" moment.
> What I want to do is inspect the DMA buffer to see how far it's been
> overwritten, since there is a data value I know cannot be generated by
> the hardware that's feeding samples to the card (over half the data
> pins are hardwired to known logic levels).
> 
> I've been treating it as though my inspection of a given sample in the
> buffer counts as "transfer completed" for purposes of that sample.

Are you inspecting the buffer only after reciept of an interrupt or are 
you polling?  

> 
> > When you do a write operation you should:
> 
> > 1) Make sure the buffer contains all the data you want to transmit.
> 
> > 2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in
> > the CPU writeback cache is flushed to memory.
> 
> > 3) Tell the hardware to do the write operation.
> 
> > 4) When the write operation completes... well it shouldn't matter.
> 
> ...but, according to the 8.0 manpage, I should do a POSTWRITE anyway,
> and going under the hood (this is all on amd64), I find that PREREAD is
> a no-op and POSTWRITE might matter because it issues an mfence to avoid
> memory access reordering issues.

I doubt the mfence does much of anything in this circumstance, but 
POSTWRITE does tell the kernel it can free up any bounce buffers it 
may have allocated if it allocated bounce buffers, but I digress.

> 
> > If you have a ring buffer you should try to map it CONSISTENT which
> > will disable all caching of that memory.
> 
> CONSISTENT?  I don't find that anywhere; do you mean COHERENT?

Yes COHERENT.  (That's what I get for relying om my memory.)

> 
> > However, some CPUs will not allow you to disable caching, so you
> > should put in the appropriate bus_dmamap_sync() operations so the
> > code will not break on those machines.
> 
> For my immediate needs, I don't care about anything other than amd64.
> But I'd prefer to understand the paradigm properly for the benefit of
> potential future work.

I believe if you use COHERENT on amd64 none of this matters since it turns 
off caching on those memory regions.  (But I don't have time to grep the 
souces to verify this.)


> > Then copy the data out of the ring buffer and do another
> > BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate.
> 
> Then I think I was already doing everything necessary.  And, indeed, I
> tried making the read routine do POSTREAD|POSTWRITE before and
> PREREAD|PREWRITE after its read-test-write of the samples, and it
> didn't help.

Ah now we're getting to something interesting.

What failure mode are you seeing?

> >> One of the things that confuses me is that I have no write-direction
> >> DMA going on at all; all the DMA is in the read direction.  But
> >> there is a driver write to the buffer that is, to put it loosely,
> >> half of a write DMA operation (the "host writes the buffer" half).
> > When the CPU updates the contents of the ring buffer it *is* a DMA
> > write,
> 
> Well, maybe from bus_dma's point of view, but I would not say there is
> write-direction DMA happening unless something DMAs data out of memory.
>
> > even if the device never tries to read the contents, since the update
> > must be flushed from the cache to DRAM or you may end up reading
> > stale data later.
> 
> So I have to treat it like a DMA write even if there is never any
> write-direction DMA actually going on?

Yes.

> Then the problem *probably* is not bus_dma botchery.

Eduardo


Re: Am I using bus_dma right?

2020-04-23 Thread Mouse
> Let me try to simplify these concepts.

Thank you; that would help significantly.

>> I'm not doing read/write DMA.  [...]
> If you are not doing DMA you don't need to do any memory
> synchronization (modulo SMP issues with other CPUs, but that's a
> completely different topic.)

Oh, I'm doing DMA.  Just not read/write DMA.  (Buffer descriptors are
write-direction DMA only, data is read-direction DMA only.)

> The problem is many modern CPUs have write-back caches which are not
> shared by I/O devices.  So when you do a read operation (from device
> to CPU) you should:

> 1) Do a BUS_DMASYNC_PREREAD to make sure there is no data in the
> cache that may be written to DRAM during the I/O operation.

> 2) Tell the hardware to do the read operation.

> 3) When the transaction completes issue a BUS_DMASYNC_POSTREAD to
> make sure the CPU sees the data in DRAM not stale data in the cache.

Okay, here's the first problem.  There is no clear "transaction
completes".

The card has a DMA engine on it (a PLX9080, on the off chance you've
run into it before) that can DMA into chained buffers.  I set it up
with a ring of butters - a chain of buffers with the last buffer
pointing to the first, none of them with the "end of chain" bit set -
and tell it to go.  I request an interrupt at completion of each
buffer, so I have a buffer-granularity idea of where it's at, modulo
interrupt servicing latency.

This means that there is no clear "this transfer has completed" moment.
What I want to do is inspect the DMA buffer to see how far it's been
overwritten, since there is a data value I know cannot be generated by
the hardware that's feeding samples to the card (over half the data
pins are hardwired to known logic levels).

I've been treating it as though my inspection of a given sample in the
buffer counts as "transfer completed" for purposes of that sample.

> When you do a write operation you should:

> 1) Make sure the buffer contains all the data you want to transmit.

> 2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in
> the CPU writeback cache is flushed to memory.

> 3) Tell the hardware to do the write operation.

> 4) When the write operation completes... well it shouldn't matter.

...but, according to the 8.0 manpage, I should do a POSTWRITE anyway,
and going under the hood (this is all on amd64), I find that PREREAD is
a no-op and POSTWRITE might matter because it issues an mfence to avoid
memory access reordering issues.

> If you have a ring buffer you should try to map it CONSISTENT which
> will disable all caching of that memory.

CONSISTENT?  I don't find that anywhere; do you mean COHERENT?

> However, some CPUs will not allow you to disable caching, so you
> should put in the appropriate bus_dmamap_sync() operations so the
> code will not break on those machines.

For my immediate needs, I don't care about anything other than amd64.
But I'd prefer to understand the paradigm properly for the benefit of
potential future work.

> When you set up the mapping for the ring buffer you should do either
> a BUS_DMASYNC_PREREAD, or if you need to initialize some structures
> in that buffer use BUS_DMASYNC_PREWRITE.  One will do a cache
> invalidate, the other one will force a writeback operation.

I already PREWRITE the whole DMA-accessible area before telling the DMA
engine to start.

> When you get a device interrupt, you should do a BUS_DMAMEM_POSTREAD
> to make sure anything that might have magically migrated into the
> cache has been invalidated.

There is no interrupt involved, in general.  I request interrupts at
buffer boundaries, but the buffers are very big compared to most DMAed
blocks - a typical DMAed block is about 800 bytes, but the buffers are
half a meg.

> Then copy the data out of the ring buffer and do another
> BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate.

Then I think I was already doing everything necessary.  And, indeed, I
tried making the read routine do POSTREAD|POSTWRITE before and
PREREAD|PREWRITE after its read-test-write of the samples, and it
didn't help.

>> One of the things that confuses me is that I have no write-direction
>> DMA going on at all; all the DMA is in the read direction.  But
>> there is a driver write to the buffer that is, to put it loosely,
>> half of a write DMA operation (the "host writes the buffer" half).
> When the CPU updates the contents of the ring buffer it *is* a DMA
> write,

Well, maybe from bus_dma's point of view, but I would not say there is
write-direction DMA happening unless something DMAs data out of memory.

> even if the device never tries to read the contents, since the update
> must be flushed from the cache to DRAM or you may end up reading
> stale data later.

So I have to treat it like a DMA write even if there is never any
write-direction DMA actually going on?

Then the problem *probably* is not bus_dma botchery.

Someone else wrote me saying it was difficult to tell much without
actually seeing the 

Re: Am I using bus_dma right?

2020-04-23 Thread Eduardo Horvath



Let me try to simplify these concepts.

On Thu, 23 Apr 2020, Mouse wrote:

> I'm not doing read/write DMA.  DMA never transfers from memory to the
> device.  (Well, I suppose it does to a small extent, in that the device
> reads buffer descriptors.  But the buffer descriptors are set up once
> and never touched afterwards; the code snippet I posted is not writing
> to them.)

If you are not doing DMA you don't need to do any memory synchronization 
(modulo SMP issues with other CPUs, but that's a completely different 
topic.)

> The hardware is DMAing into the memory, and nothing else.  The driver
> reads the memory and immediately writes it again, to be read by the
> driver some later time, possibly being overwritten by DMA in between.
> So an example that says "do write DMA" is not directly applicable.

If a (non CPU) device is directly reading or writing DRAM without the 
CPU having to read a register and then write its contents to memory, then 
it is doing DMA.

The problem is many modern CPUs have write-back caches which are not 
shared by I/O devices.  So when you do a read operation (from device to 
CPU) you should:

1) Do a BUS_DMASYNC_PREREAD to make sure there is no data in the cache 
that may be written to DRAM during the I/O operation.

2) Tell the hardware to do the read operation.

3) When the transaction completes issue a BUS_DMASYNC_POSTREAD to make 
sure the CPU sees the data in DRAM not stale data in the cache.


When you do a write operation you should:

1) Make sure the buffer contains all the data you want to transmit.

2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in the 
CPU writeback cache is flushed to memory.

3) Tell the hardware to do the write operation.

4) When the write operation completes... well it shouldn't matter.


If you have a ring buffer you should try to map it CONSISTENT which will 
disable all caching of that memory.  However, some CPUs will not allow you 
to disable caching, so you should put in the appropriate bus_dmamap_sync() 
operations so the code will not break on those machines.

When you set up the mapping for the ring buffer you should do either a 
BUS_DMASYNC_PREREAD, or if you need to initialize some structures in that 
buffer use BUS_DMASYNC_PREWRITE.  One will do a cache invalidate, the 
other one will force a writeback operation.

When you get a device interrupt, you should do a BUS_DMAMEM_POSTREAD to 
make sure anything that might have magically migrated into the cache has 
been invalidated.  Then copy the data out of the ring buffer and do 
another BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate.

> The example makes it look as though read DMA (device->memory) needs to
> be bracketed by PREREAD and POSTREAD and write DMA by PREWRITE and
> POSTWRITE.  If that were what I'm doing, it would be straightforward.
> Instead, I have DMA and the driver both writing memory, but only the
> driver ever reading it.
> 
> Your placement for PREREAD and POSTREAD confuses me because it doesn't
> match the example.  The example says
> 
>   /* invalidate soon-to-be-stale cache blocks */
>   bus_dmamap_sync(..., BUS_DMASYNC_PREREAD);
>   [ do read DMA ]
>   /* copy from bounce */
>   bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD);
>   /* read data now in driver-provided buffer */
>   [ computation ]
>   /* data to be written now in driver-provided buffer */
>   /* flush write buffers and writeback, copy to bounce */
>   bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE);
>   [ do write DMA ]
>   /* probably a no-op, but provided for consistency */
>   bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE);
> 
> but what your changes would have my driver doing is
> 
> [read-direction DMA might happen here]
> PREREAD
> driver reads data from driver-provided buffer
> POSTREAD
> [read-direction DMA might happen here]
> PREWRITE
> driver writes data to driver-provided buffer
> POSTWRITE
> [read-direction DMA might happen here]

That bit is not right.

> The conceptual paradigm is
> 
> - at attach time: allocate, set up, and load the mapping
> 

Presumably you should do a BUS_DMASYNC_PREWRITE somewhere in here

> - at open time: tell hardware to start DMAing

and a BUS_DMASYNC_POSTWRUTE arond here.

> 

Here you need a BUS_DMAMEM_POSTREAD.

> - at read time (ie, repeatedly): driver reads buffer to see how much
>has been overwritten by DMA, copying the overwritten portion out and
>immediately resetting it to the pre-overwrite data, to be
>overwritten again later

If you wrote anything to the ring buffer during this operation you need to 
insert a BUS_DMASYNC_PREWRITE.

> 
> - at close tiem: tell hardware to stop DMAing
> 
> The map is never unloaded; the driver is not detachable.  The system
> has no use case for that, so I saw no point in putting time into it.
> 
> The code I quoted is the "at read time" part.  My guess based on the
> manpage's example and what you've written is that I need
> 
>   

Re: Am I using bus_dma right?

2020-04-23 Thread Mouse
>>  while (fewer than n samples copied)
>>  DMASYNC_POSTREAD for sample at offset o
> That should be PREREAD (to make sure the dma'd data is visible for
> the cpu)
>>  read sample at offset o
> and teh POSTREAD should be here

>>  if value is "impossible", break
> missig PREWRITE here
>>  set sample at offset o to "impossible" value
>>  DMASYNC_PREWRITE for sample at offset o
> and this should be POSTWRITE

> See the example in the -current man page:

This looks a lot like the example in the 8.0 manpage, which did not
help much because my use case does not match its very well:

>   An example of using bus_dmamap_sync(), involving
>   multiple read-write use of a single mapping might look
>   like this:

I'm not doing read/write DMA.  DMA never transfers from memory to the
device.  (Well, I suppose it does to a small extent, in that the device
reads buffer descriptors.  But the buffer descriptors are set up once
and never touched afterwards; the code snippet I posted is not writing
to them.)

The hardware is DMAing into the memory, and nothing else.  The driver
reads the memory and immediately writes it again, to be read by the
driver some later time, possibly being overwritten by DMA in between.
So an example that says "do write DMA" is not directly applicable.

The example makes it look as though read DMA (device->memory) needs to
be bracketed by PREREAD and POSTREAD and write DMA by PREWRITE and
POSTWRITE.  If that were what I'm doing, it would be straightforward.
Instead, I have DMA and the driver both writing memory, but only the
driver ever reading it.

Your placement for PREREAD and POSTREAD confuses me because it doesn't
match the example.  The example says

/* invalidate soon-to-be-stale cache blocks */
bus_dmamap_sync(..., BUS_DMASYNC_PREREAD);
[ do read DMA ]
/* copy from bounce */
bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD);
/* read data now in driver-provided buffer */
[ computation ]
/* data to be written now in driver-provided buffer */
/* flush write buffers and writeback, copy to bounce */
bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE);
[ do write DMA ]
/* probably a no-op, but provided for consistency */
bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE);

but what your changes would have my driver doing is

[read-direction DMA might happen here]
PREREAD
driver reads data from driver-provided buffer
POSTREAD
[read-direction DMA might happen here]
PREWRITE
driver writes data to driver-provided buffer
POSTWRITE
[read-direction DMA might happen here]

The conceptual paradigm is

- at attach time: allocate, set up, and load the mapping

- at open time: tell hardware to start DMAing

- at read time (ie, repeatedly): driver reads buffer to see how much
   has been overwritten by DMA, copying the overwritten portion out and
   immediately resetting it to the pre-overwrite data, to be
   overwritten again later

- at close tiem: tell hardware to stop DMAing

The map is never unloaded; the driver is not detachable.  The system
has no use case for that, so I saw no point in putting time into it.

The code I quoted is the "at read time" part.  My guess based on the
manpage's example and what you've written is that I need

while (fewer than n samples copied)
POSTWRITE
POSTREAD
read sample from buffer
if sample isn't "impossible"
write "impossible" value to buffer
PREWRITE
PREREAD
if sample is "impossible", break

because some aspects of "write", and relatively normal "read", are
happening outside that code segment.  But this is different enough from
what you said (and possibly not well-paired - should the PREWRITE be
outside the if?) that now I'm possibly even less sure of myself.  I
could just try different permutations in the hope of finding something
that works, but that strikes me as one of the worst possible ways to do
it; I would prefer to understand the paradigm enough to get it right.

I am not concerned about the race between pushing the driver-written
value to the buffer and DMA overwriting it; provided the driver's write
gets pushed reasonably promptly, this will happen only in error
conditions like userland ignoring the device for too long - it takes
the hardware multiple seconds to wrap around the ring buffer.

> I always have to look up the direction, but READ is when CPU reads
> data provided by the device.

Yes: READ corresponds to read() and WRITE to write().  One of the
things that confuses me is that I have no write-direction DMA going on
at all; all the DMA is in the read direction.  But there is a driver
write to the buffer that is, to put it loosely, half of a write DMA
operation (the "host 

Re: Am I using bus_dma right?

2020-04-23 Thread Martin Husemann
On Wed, Apr 22, 2020 at 05:53:46PM -0400, Mouse wrote:
>   s = splhigh()
>   while (fewer than n samples copied)
>   DMASYNC_POSTREAD for sample at offset o

That should be PREREAD (to make sure the dma'd data is visible for the
cpu)

>   read sample at offset o

and teh POSTREAD should be here

>   if value is "impossible", break

missig PREWRITE here

>   set sample at offset o to "impossible" value
>   DMASYNC_PREWRITE for sample at offset o

and this should be POSTWRITE

>   store sample in buffer[]
>   splx(s)
>   uiomove from buffer[]
>   if we found an "impossible" value, break;

See the example in the -current man page:

  An example of using bus_dmamap_sync(), involving multiple read-
  write use of a single mapping might look like this:

  bus_dmamap_load(...);

  while (not done) {
  /* invalidate soon-to-be-stale cache blocks */
  bus_dmamap_sync(..., BUS_DMASYNC_PREREAD);

  [ do read DMA ]

  /* copy from bounce */
  bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD);

  /* read data now in driver-provided buffer */

  [ computation ]

  /* data to be written now in driver-provided buffer */

  /* flush write buffers and writeback, copy to bounce */
  bus_dmamap_sync(..., BUS_DMASYNC_PREWRITE);

  [ do write DMA ]

  /* probably a no-op, but provided for consistency */
  bus_dmamap_sync(..., BUS_DMASYNC_POSTWRITE);
  }

  bus_dmamap_unload(...);

I always have to look up the direction, but READ is when CPU reads data
provided by the device.

Martin