Hi Alan,

I see. However, you did not answer the question about the traffic
load. That is what going to decide if this is a HW or SW problem, I
think.

Miklos

PS. By the way, I agree that a software CRC check is needed, but 1) it
wastes 2 more bytes, no?, 2) even that can let erroneous packets
through as packets can be corrupted but still pass the CRC check.

On Sat, Apr 16, 2011 at 1:12 AM, Alan Marchiori <[email protected]> wrote:
> Miklos,
>
> I was thinking the same thing after I sent my last message.  To check
> this possibility I added the code below.  On the CC2420Receive.receive
> event it makes a local copy of the message_t (this is pretty low in
> the radio stack).  Then when Receive.receive is signaled it compares
> the data segment to the copy.  I ran through 600k received packets
> without seeing any evidence of this, but still got failed application
> CRCs with the CC2420 CRC bit set (ie CC2420 thinks the CRC passed).
> This doesn't eliminate the possibility that they payload is being
> corrupted during the SPI transfer.  Not sure that's something I can
> test.
>
> I'm still sticking to my conclusion that we need another CRC besides
> the 802.15.4 FCS, because:
> 1) (possibly) the cc2420 doesn't properly do the check
> 2) because the cc2420 removes the FCS value from the packet (and
> replaces it with RSSI/LQI) we cannot detect data errors over the SPI
> bus.  (or there is a bug in the SPI code causing this corruption)
>
>
>         message_t      lastReceive;
>
>        /* CC2420Receive.receive is signaled just
>         * after reading the RXFIFO this is as
>         * early in the stack as we can get the
>         * data we might not get a Receive.receive
>         * for this packet but every packet that
>         * gets a Receive.receive will have a
>         * CC2420Receive.receive
>         */
>        async event void CC2420Receive.receive(
>                        uint8_t type, message_t* m_p_rx_buf ){
>                memcpy (&lastReceive,
>                                m_p_rx_buf,
>                                sizeof(message_t));
>        }
>
>        /* Receive.receive is at the top of the network stack
>         */
>        event message_t* Receive.receive(message_t *msg,
>                                        void* payload, uint8_t len){
>                // msg should match exactly the
>                // contents at lastReceive, just check
>                // the data segment, head/foot/meta
>                // may be changed later in the stack
>                if (memcmp (&msg->data,
>                                        &lastReceive.data,
>                                        TOSH_DATA_LENGTH) != 0){
>                        printf ("NW STACK CORRUPTION!\n");
>                        printfflush();
>
>                        dump (&msg->data, TOSH_DATA_LENGTH);
>                        dump (&lastReceive.data, TOSH_DATA_LENGTH);
>                }
> (complete code is at
> http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc)
>
> On Fri, Apr 15, 2011 at 2:51 PM, Miklos Maroti <[email protected]> 
> wrote:
>> Hi Alan!
>>
>> I can imagine another reason: somehow the packet is corrupted AFTER it
>> has passed the hardware CRC check. For example, it is partially
>> overwritten by another packet (if that can happen), or a software bug.
>> Do you have heavy traffic, or can you observe the problem with 1
>> message / second rate? If so, then it most likely a hardware problem.
>>
>> Miklos
>>
>> On Fri, Apr 15, 2011 at 10:46 PM, Alan Marchiori <[email protected]> wrote:
>>> Good comments... I finally found another instance of what seems like
>>> the same problem here:
>>> https://www.millennium.berkeley.edu/pipermail/tinyos-devel/2008-January/002363.html
>>>
>>> It doesn't sound like it was ever resolved.  Other than "you might get
>>> a erroneous packet in 50k".  This still seems too high for me.
>>>
>>> To continue investigating this I modified CC2420ReceiveP.nc and
>>> changed the first line to the second.  This removes the cc2420 CRC
>>> check and should pass up any successfully received message regardless
>>> of the cc2420 CRC.
>>>
>>>      //if ( ( buf[ rxFrameLength ] >> 7 ) && rx_buf ) {
>>>      if (rx_buf){
>>>
>>> Then I added code in my application to check the CRC in the packet
>>> metadata.  The interesting thing is after about 500k packet receptions
>>> I have not seen a single packet where the CC2420 signaled a CRC error.
>>>  On the other hand, I did detect about 10 CRC errors in the
>>> application-level CRC.
>>>
>>>                cc2420_metadata_t* metadata = call 
>>> CC2420PacketBody.getMetadata( msg );
>>>                if (!metadata->crc){
>>>                        printf ("HW CRC ERROR!\n");
>>>                        printfflush();
>>>                }
>>>
>>> My conclusion is that the CC2420 is not actually checking the CRC.  I
>>> can resolve my problem by simply adding an application-level CRC to my
>>> packets, but I think this is a serious issue that needs to be
>>> addressed.
>>>
>>>
>>> On Fri, Apr 15, 2011 at 12:23 PM, Michael Schippling <[email protected]> 
>>> wrote:
>>>> Some thoughts....
>>>>
>>>> First I would make sendPacket() a task rather than calling
>>>> it directly from the two events. I suspect that each of those
>>>> events is a task as well -- I know Timer.fired() in T1 is --
>>>> but it makes the partitioning a bit cleaner.
>>>>
>>>> Second, I wonder if printf() is corrupting something?
>>>> The simplest thing would be to just toggle an LED when you
>>>> get a bad CRC just to make sure. With a little more pain,
>>>> you could implement a separate serial message for reporting.
>>>>
>>>> One would expect parts of the header and regular CRC to
>>>> get corrupted if it was a radio or hardware problem. And
>>>> you have no way of counting how many messages fail at that
>>>> level. Perhaps add a sourceID and serial number to each
>>>> message so you can keep track?
>>>>
>>>> Otherwise the code looks ok to me...
>>>>
>>>> MS
>>>>
>>>>
>>>> Alan Marchiori wrote:
>>>>>>
>>>>>> So it seems that you have multiple base-stations?
>>>>>> Or how are you getting log messages from different motes?
>>>>>
>>>>> there are no base stations or every mote is a base station, depending
>>>>> how you view it. It is a testbed configuration where all motes have
>>>>> serial forwarders connected and I connect to all of them to receive
>>>>> printf messages.  Every mote both transmits and receives.
>>>>>
>>>>>> I have never checked the internal integrity of messages
>>>>>> but have also not noticed any glaring bit errors in
>>>>>> "normal" use.
>>>>>>
>>>>>> Anyway...KISS...I would reduce everything to the simplest
>>>>>> configuration with the most basic program, no queuing, and
>>>>>> probably just the default payload size. Start with one
>>>>>> re-mote and one base and then scale up. Try using known
>>>>>> message content as well as random. Ramp up the speed.
>>>>>> All the usual... It could be that you have a loose pointer
>>>>>> in your program that corrupts the message before transmission.
>>>>>
>>>>> I created a version without queuing and still see similar error rates.
>>>>>  I then modified the code to send out a constant bit pattern (all
>>>>> 0xa55a) and then print out the received values in the case of an
>>>>> error, here are the first two I saw after about 60k received packets
>>>>> (right around a uint16 rollover....):
>>>>>
>>>>> 2011-04-14 16:30:59,070 - printf - INFO - (9005): CRC ERROR: 725a b8b8
>>>>> 6ab8 9ce9 15ff 8ba9 7b97 b8de  4e7 7847 714e 7d4e cd43 cd42 cde3 cd43
>>>>> 2011-04-14 16:31:41,620 - printf - INFO - (9005): CRC ERROR: a55a a55a
>>>>> a55a a55a a55a a55a 665a 66cc dec6 6756 464e 6655 6666 c66f ee66 d55d
>>>>>
>>>>> the first one is total garbage, the second is more interesting that it
>>>>> starts out as correct (a55a) and then on the 7th byte becomes corrupt.
>>>>>
>>>>> The new code is at:
>>>>> http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc
>>>>>
>>>>> If anyone can see anything wrong with this code, I'd be very
>>>>> interested to solve/better understand this problem.
>>>>>
>>>>> Thanks,
>>>>> Alan
>>>>> _______________________________________________
>>>>> Tinyos-help mailing list
>>>>> [email protected]
>>>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>>>
>>>
>>> _______________________________________________
>>> Tinyos-help mailing list
>>> [email protected]
>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>>
>>
>

_______________________________________________
Tinyos-help mailing list
[email protected]
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Reply via email to