Re: [ARTIQ] [RFC] timeline behavior of coredevice API kernels

Slichter, Daniel H. (Fed) Thu, 03 Mar 2016 19:01:36 -0800

 
> For kernels that are dominantly referring to a single point in the timeline
> (ttl.on(), and also both spi_dac.set(), and dds.set() even though they both
> involve a long sequence of actions), their potential "preparatory" and
> "cleanup" actions should be scheduled such that the "effect" is located at the
> current point on the timeline. And they should apply net zero net delay on
> the timeline. That means that DDS and SPI DAC do all bus writing in the past
> before asserting FUD and LDAC in the present. But the deassertion of FUD
> and LDAC happens in the future. An spi_adc.get() would sample at `now` but
> would do the readout in the future.


I think SPI writes should occur in the "past" and SPI reads should occur in the 
"future".  This is based on the notion that the time you care about (and the 
one which is simplest to think about) really has to do with whether or not the 
slave device is "ready" in the appropriate sense.  For writes, the slave is 
"ready" once it has received the information you wanted to write to it, i.e. 
the end of the write.  For reads, the slave must be "ready" at the start of the 
read transaction.  

I would then argue that the SPI functionality should be such that an SPI write 
transaction is completed, by which I mean that the SPI clock at the output of 
the master has returned to its default level exactly at "now".  For an SPI read 
transaction, the SPI clock should depart from its default level exactly at 
"now".  It may be acceptable if one simply guarantees that for writes, the SPI 
clock has returned to default level at some time before "now", and for reads 
that the SPI clock will depart from its default level sometime after "now", as 
long as the time duration between either of these actions and "now" is 
explicitly bounded and is "small" in the sense of not being comparable to the 
duration of a full atomic SPI transaction as implemented.  

The performance of a FUD or LDAC should be separate from the SPI transaction 
itself.  Some transactions do not want/need a FUD-like signal -- for example 
each atomic SPI call writes up to 32 bits in the current scheme.  For 
programming a DDS, for example, one may wish to have several SPI calls (to 
write amplitude, FTW, POW) without issuing a FUD until all writes are 
completed, or one might wish to update registers on several DDS chips sharing 
one SPI bus and then perform a simultaneous FUD on all at once.  Also, 
different devices will have different allowed latencies between the completion 
of an SPI transaction and updating internal registers with FUD or LDAC, and 
this should be handled either manually by the user, or preferably by their 
writing higher-level methods specific to the individual hardware ("device 
drivers").  SPI doesn't know about FUD or LDAC, nor should it.  This simplifies 
the SPI and puts the complications of individual devices onto higher-level 
code, which is desirable.  

For some devices, a read transaction necessarily involves a prior write 
transaction announcing what is desired to be read (e.g. DDS register readback). 
 In this instance, the functionality should be split into two SPI calls, one 
for a write (issuing the command), and one for a read (reading back the 
result), allowing for a delay to be inserted (manually or at a "device driver" 
level) between these two transactions as required by the particular device.  
This delay will then give the exact time required between the last edge of the 
"write" clock and the first edge of the "read" clock, which is generally what 
is specified on datasheets.  

> Kernels that dominantly refer to a time interval (ttl.pulse(),
> ramsey_pulse_sequence()) or those where a unique point in time can not be
> assigned (spi.write(): some devices use the last bit clocked in, others the
> deassertion of cs_n) 

One solution here might be to add an input parameter to spi.write() and 
spi.read() which indicates whether or not the chip select should be left 
asserted at the end of the transaction or not.  If so, the point in time for 
read and write methods is the last clock edge as described above; if not, then 
the deassertion time is used.  Another wrinkle here is that different chips 
have different requirements on the delay between assertion of cs/first clock 
edge and last clock edge/deassertion of cs, so probably these methods should 
also include delays relative to first/last clock edges for 
assertion/deassertion of cs, respectively.  This would allow for situations 
like the AD9914, where cs must remain asserted after writing a read instruction 
through the end of the subsequent readback, being handled with two separate 
calls to write() and read(), with a delay in between, as outlined above.  

> should schedule their actions between now (at kernel
> entry) and when they are done (kernel return). They should apply the
> appropriate delay to the timeline to ensure the same kernel can immediately
> be called again with an equivalent result. And without fearing
> RTIOSequencErrors or RTIOCollisionErrors.

> For duration methods you don't get RTIOSequenceErrors but your timeline is
> affected.

> For bus drivers (SPI) RTIO PHY transactions should be complete, protocol
> sequencing and state machines should have returned to their initial state.
> There should not be events in the future (later than `now` at kernel return)
> unless care is taken that this does not lead to RTIOSequenceErrors. The dalay
> should account for latency and
> pipelining: if e.g. an RT2WB
> bus transaction can happen every 3 cycles but the PHY's activity then starts
> and lasts for another 30 cycles, the applied delay should be 30 cycles.
> 
> Batching drivers with point-in-time devices (multi channel DACs and ADCs on
> a SPI bus, with external LDAC or SAMPLE as well as the DDS
> bus) should schedule preparatory and cleanup actions around the relevant
> "batched" output/sampling point-in-time. This way duration-like methods
> are become abstracted to point-like methods. 

This seems like a good idea.  Would batched transactions be implemented at 
SPI/DDS bus core level, or at a higher level (e.g. in something like a device 
driver?)  Are there significant costs in terms of the number of clock cycles 
required if one implements batching at this kind of higher level?  If one 
simply puts two spi.write() commands for a given SPI bus back to back in an 
experiment file, for example, it seems to me that it would make sense for the 
compiler to handle this in a straightforward way, where the "now" after the 
first command and the "now" after the second command are appropriately spaced 
by the duration of the second command.  Then if one wants to insert a delay 
between these commands, the "now" after the second command would be the "now" 
after the first command plus the specified delay plus the duration of the 
second command.  This whole sequence can be packaged into a spi.batch_write() 
method which then would abstract it again to a point-like method with a 
duration that can be determined by the compiler as required.  My question is: 
does implementing it this way cost a lot more time on the core device than 
implementing it directly as a feature in the SPI core gateware?  

> To handle this nicely we
> should probably look into cleanly allowing methods to retrieve the delay
> incurred by another method (without calling it).

These are all very important and subtle considerations.  For SPI calls (or DDS 
calls) on separate physical buses, there needs to be information on the 
duration required by the core device to push the required information to the 
RTIO PHY.  For calls on the same physical bus, the duration of the PHY's 
activity should be used.  It seems to me unavoidable that one implements the 
sort of methods for retrieving these types of delays from methods, and that SPI 
calls and DDS bus calls need to have both "point-in-time" and "duration" 
characteristics specified.  

This raises the issue of what happens if the number of bits in an SPI read or 
write transaction (effectively, the duration) is calculated by the core device 
on the fly, rather than specified at compile time.  In this instance, I would 
suggest that a maximum length for the transaction be required at compile time, 
that the duration be set according to that maximum length, and that the core 
device attempting to put more bits in that this maximum length generate an 
appropriate RTIO exception.  One could also disallow runtime determination of 
the number of bits in an SPI transaction -- I am not sure how useful this 
feature may or may not be.  

_______________________________________________
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq

Re: [ARTIQ] [RFC] timeline behavior of coredevice API kernels

Reply via email to