Miklos,

I was thinking the same thing after I sent my last message.  To check
this possibility I added the code below.  On the CC2420Receive.receive
event it makes a local copy of the message_t (this is pretty low in
the radio stack).  Then when Receive.receive is signaled it compares
the data segment to the copy.  I ran through 600k received packets
without seeing any evidence of this, but still got failed application
CRCs with the CC2420 CRC bit set (ie CC2420 thinks the CRC passed).
This doesn't eliminate the possibility that they payload is being
corrupted during the SPI transfer.  Not sure that's something I can
test.

I'm still sticking to my conclusion that we need another CRC besides
the 802.15.4 FCS, because:
1) (possibly) the cc2420 doesn't properly do the check
2) because the cc2420 removes the FCS value from the packet (and
replaces it with RSSI/LQI) we cannot detect data errors over the SPI
bus.  (or there is a bug in the SPI code causing this corruption)


         message_t      lastReceive;

        /* CC2420Receive.receive is signaled just
         * after reading the RXFIFO this is as
         * early in the stack as we can get the
         * data we might not get a Receive.receive
         * for this packet but every packet that
         * gets a Receive.receive will have a
         * CC2420Receive.receive
         */
        async event void CC2420Receive.receive(
                        uint8_t type, message_t* m_p_rx_buf ){
                memcpy (&lastReceive,
                                m_p_rx_buf,
                                sizeof(message_t));
        }
        
        /* Receive.receive is at the top of the network stack
         */
        event message_t* Receive.receive(message_t *msg,
                                        void* payload, uint8_t len){            
                // msg should match exactly the
                // contents at lastReceive, just check
                // the data segment, head/foot/meta
                // may be changed later in the stack
                if (memcmp (&msg->data,
                                        &lastReceive.data,
                                        TOSH_DATA_LENGTH) != 0){
                        printf ("NW STACK CORRUPTION!\n");
                        printfflush();
                        
                        dump (&msg->data, TOSH_DATA_LENGTH);
                        dump (&lastReceive.data, TOSH_DATA_LENGTH);
                }
(complete code is at
http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc)

On Fri, Apr 15, 2011 at 2:51 PM, Miklos Maroti <[email protected]> wrote:
> Hi Alan!
>
> I can imagine another reason: somehow the packet is corrupted AFTER it
> has passed the hardware CRC check. For example, it is partially
> overwritten by another packet (if that can happen), or a software bug.
> Do you have heavy traffic, or can you observe the problem with 1
> message / second rate? If so, then it most likely a hardware problem.
>
> Miklos
>
> On Fri, Apr 15, 2011 at 10:46 PM, Alan Marchiori <[email protected]> wrote:
>> Good comments... I finally found another instance of what seems like
>> the same problem here:
>> https://www.millennium.berkeley.edu/pipermail/tinyos-devel/2008-January/002363.html
>>
>> It doesn't sound like it was ever resolved.  Other than "you might get
>> a erroneous packet in 50k".  This still seems too high for me.
>>
>> To continue investigating this I modified CC2420ReceiveP.nc and
>> changed the first line to the second.  This removes the cc2420 CRC
>> check and should pass up any successfully received message regardless
>> of the cc2420 CRC.
>>
>>      //if ( ( buf[ rxFrameLength ] >> 7 ) && rx_buf ) {
>>      if (rx_buf){
>>
>> Then I added code in my application to check the CRC in the packet
>> metadata.  The interesting thing is after about 500k packet receptions
>> I have not seen a single packet where the CC2420 signaled a CRC error.
>>  On the other hand, I did detect about 10 CRC errors in the
>> application-level CRC.
>>
>>                cc2420_metadata_t* metadata = call 
>> CC2420PacketBody.getMetadata( msg );
>>                if (!metadata->crc){
>>                        printf ("HW CRC ERROR!\n");
>>                        printfflush();
>>                }
>>
>> My conclusion is that the CC2420 is not actually checking the CRC.  I
>> can resolve my problem by simply adding an application-level CRC to my
>> packets, but I think this is a serious issue that needs to be
>> addressed.
>>
>>
>> On Fri, Apr 15, 2011 at 12:23 PM, Michael Schippling <[email protected]> 
>> wrote:
>>> Some thoughts....
>>>
>>> First I would make sendPacket() a task rather than calling
>>> it directly from the two events. I suspect that each of those
>>> events is a task as well -- I know Timer.fired() in T1 is --
>>> but it makes the partitioning a bit cleaner.
>>>
>>> Second, I wonder if printf() is corrupting something?
>>> The simplest thing would be to just toggle an LED when you
>>> get a bad CRC just to make sure. With a little more pain,
>>> you could implement a separate serial message for reporting.
>>>
>>> One would expect parts of the header and regular CRC to
>>> get corrupted if it was a radio or hardware problem. And
>>> you have no way of counting how many messages fail at that
>>> level. Perhaps add a sourceID and serial number to each
>>> message so you can keep track?
>>>
>>> Otherwise the code looks ok to me...
>>>
>>> MS
>>>
>>>
>>> Alan Marchiori wrote:
>>>>>
>>>>> So it seems that you have multiple base-stations?
>>>>> Or how are you getting log messages from different motes?
>>>>
>>>> there are no base stations or every mote is a base station, depending
>>>> how you view it. It is a testbed configuration where all motes have
>>>> serial forwarders connected and I connect to all of them to receive
>>>> printf messages.  Every mote both transmits and receives.
>>>>
>>>>> I have never checked the internal integrity of messages
>>>>> but have also not noticed any glaring bit errors in
>>>>> "normal" use.
>>>>>
>>>>> Anyway...KISS...I would reduce everything to the simplest
>>>>> configuration with the most basic program, no queuing, and
>>>>> probably just the default payload size. Start with one
>>>>> re-mote and one base and then scale up. Try using known
>>>>> message content as well as random. Ramp up the speed.
>>>>> All the usual... It could be that you have a loose pointer
>>>>> in your program that corrupts the message before transmission.
>>>>
>>>> I created a version without queuing and still see similar error rates.
>>>>  I then modified the code to send out a constant bit pattern (all
>>>> 0xa55a) and then print out the received values in the case of an
>>>> error, here are the first two I saw after about 60k received packets
>>>> (right around a uint16 rollover....):
>>>>
>>>> 2011-04-14 16:30:59,070 - printf - INFO - (9005): CRC ERROR: 725a b8b8
>>>> 6ab8 9ce9 15ff 8ba9 7b97 b8de  4e7 7847 714e 7d4e cd43 cd42 cde3 cd43
>>>> 2011-04-14 16:31:41,620 - printf - INFO - (9005): CRC ERROR: a55a a55a
>>>> a55a a55a a55a a55a 665a 66cc dec6 6756 464e 6655 6666 c66f ee66 d55d
>>>>
>>>> the first one is total garbage, the second is more interesting that it
>>>> starts out as correct (a55a) and then on the 7th byte becomes corrupt.
>>>>
>>>> The new code is at:
>>>> http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc
>>>>
>>>> If anyone can see anything wrong with this code, I'd be very
>>>> interested to solve/better understand this problem.
>>>>
>>>> Thanks,
>>>> Alan
>>>> _______________________________________________
>>>> Tinyos-help mailing list
>>>> [email protected]
>>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>>
>>
>> _______________________________________________
>> Tinyos-help mailing list
>> [email protected]
>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>
>

_______________________________________________
Tinyos-help mailing list
[email protected]
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Reply via email to