Hello Miklos,

>> According to the AT86RF230 datasheet, a transition from P_ON to TRX_OFF
>> has a typical duration of 880µs. The driver uses a vale of 510 µs (which
>> can be found in Atmels AT86RF230 software programming model)
> 
> Actually, the init code waits twice for 510 us, first to get clock,
> then we wake it up, wait more.

2*510 > 880 is right, but the sequence matters. There is a wait block
before anything else is done, then the chip is reset for 6µs.
Immediately after this reset, the state transition from P_ON to TRX_OFF
gets issued, after 510µs (not 880µs) the driver threats the chip like if
its in TRX_OFF.

And because i already mentioned reset:
>> The reset timings appear to differ from the datasheet, too (and here
>> again, the software programming model tells something different).
>> According to the datasheet, the device requires a typical value of 120µs
>> after a reset condition until it is operational again.
>>> During the reset procedure the SPI interface shall be inactive ( SEL = H;
>>> SCLK = L).
>> (AT86RF230 Datasheet)
> 
> And I think we wait longer, no?

The timing for RESET->P_ON is not specified in the datasheet, but from a
conservative point of view i would use the 120µs noted for RESET->TRX_OFF.
The datasheet just writes about 625ns after releasing RESET, however,
the current implementation does not wait at all.

>> This leads directly to the main issue i would like to discuss: The whole
>> driver behaves very optimistic and uses the typical values found in the
>> datasheet as worst-case values. I have not seen any countermeasures for
>> cases in which the radio device uses more time than this typical value,
>> in those cases the whole radio communication might (and in fact, does if
>> provoked in simulation cases) lock up.
>> I have not found any line of code where the device state gets read
>> before a state change command is issued, and no hard timeouts for cases
>> where the radio is in a different state than expected.
> 
> I would be very interested in learning where such measures could be
> provided. The problem with all this is the following: what happens if
> the hardware never does what you want. How long are you going to wait?
> In general I did not want to set timers, because that consumes
> resources (and all higher level code shares a single alarm with the
> driver).

Worst case limits: I think table 7-2 (Block settling time) in the
datasheet might build a foundation for some reasonable timeouts. If the
radio does not fulfil those specifications, it should be threatened as
defective, imho.
I do not know much about TinyOS internals. I think, there is no way
around some timer-based polling. The Watchdog you mentioned later is
just a slightly different approach ...

>> To sum it up: The driver uses typical timing values as worst-case
>> timings and does not follow any conservative approach (as stated in the
>> datasheet on page 21).
>>> The radio transceiver state is controlled by two signal pins (SLP_TR, RST ) 
>>> and the
>>> register 0x02 (TRX_STATE). A successful state change shall be confirmed by 
>>> reading
>>> the radio transceiver status from register 0x01 (TRX_STATUS).
>> (AT86RF230 Datasheet)
> 
> That is true.
> 
>> Despite of all this issues, the driver appears to work on real devices.
>> I assume the datasheet contains values with rather large safety margins,
>> but in my opinion, violating those specifications is not good.
>> Especially because the datasheet provides the only mandatory device
>> characteristics for building a simulation.
> 
> I agree, that it is not good practice to violate those limits. There
> are other reasons that a radio driver can lock up, and that what has
> caused a lot of trouble for me. In certain situations the whole chip
> can lock up, especially when you command it to do something, but at
> the very same moment (in around 1 us window !!!) an incoming message
> is received, then the internal state machine of the chip can lock up,
> and will never receive the message, nor complete the transmit. Only a
> reset helps. This seems to be fixed in recent IRIS motes with a new
> hardware revision of the chip (I have reported this to Atmel but did
> not get a reply). Similar issues prop up with the RF212 and RFA1
> drivers.

Strange. The errata for RF230 rev. A lists some nasty silicon bugs but
nothing targeted against your issue. As far as i understood the
datasheet, i would expect the device to do the transition as soon the
message has been received.
Just because i am curious: Have you tried FORCE_TRX_OFF, too? What did
the TRX_STATUS subregister indicate?

> So based on all this, it seems that a watchdog component should be
> installed above all radio drivers (especially in production mode),
> which would trigger the reset of the radio is some timing constraints
> are not met. I did not do this yet. So my plan is to make sure that
> the radio can be reset/restarted at any moment reliably, and then do
> this watchdog component. What do you think?

Sounds reasonable. After you wrote about a watchdog, i got an idea for
some two-and-a-halve-strikes approach.

First strike: Timeout violation, kick off some countermeasures in the
driver, trying to avoid a whole reset and instead just recover from some
minor timing issues (like nontypical behaviour). For example, this could
be the place for some state checking/validation/resynchronisation
between driver and radio, allowing a re-triggering of missed commands.

Second strike: Big lockup, do a hard reset of the radio device and a
(maybe) expensive initialization.

Post second "Not really a strike": Radio is broken, do some last effort
to disable it and do not touch it any more afterwards .

Maybe this approach is too complex/expensive, but it would allow a fine
grained error handling.

Regards,
Markus
_______________________________________________
Tinyos-help mailing list
[email protected]
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Reply via email to