Hi Alan, I see. However, you did not answer the question about the traffic load. That is what going to decide if this is a HW or SW problem, I think.
Miklos PS. By the way, I agree that a software CRC check is needed, but 1) it wastes 2 more bytes, no?, 2) even that can let erroneous packets through as packets can be corrupted but still pass the CRC check. On Sat, Apr 16, 2011 at 1:12 AM, Alan Marchiori <[email protected]> wrote: > Miklos, > > I was thinking the same thing after I sent my last message. To check > this possibility I added the code below. On the CC2420Receive.receive > event it makes a local copy of the message_t (this is pretty low in > the radio stack). Then when Receive.receive is signaled it compares > the data segment to the copy. I ran through 600k received packets > without seeing any evidence of this, but still got failed application > CRCs with the CC2420 CRC bit set (ie CC2420 thinks the CRC passed). > This doesn't eliminate the possibility that they payload is being > corrupted during the SPI transfer. Not sure that's something I can > test. > > I'm still sticking to my conclusion that we need another CRC besides > the 802.15.4 FCS, because: > 1) (possibly) the cc2420 doesn't properly do the check > 2) because the cc2420 removes the FCS value from the packet (and > replaces it with RSSI/LQI) we cannot detect data errors over the SPI > bus. (or there is a bug in the SPI code causing this corruption) > > > message_t lastReceive; > > /* CC2420Receive.receive is signaled just > * after reading the RXFIFO this is as > * early in the stack as we can get the > * data we might not get a Receive.receive > * for this packet but every packet that > * gets a Receive.receive will have a > * CC2420Receive.receive > */ > async event void CC2420Receive.receive( > uint8_t type, message_t* m_p_rx_buf ){ > memcpy (&lastReceive, > m_p_rx_buf, > sizeof(message_t)); > } > > /* Receive.receive is at the top of the network stack > */ > event message_t* Receive.receive(message_t *msg, > void* payload, uint8_t len){ > // msg should match exactly the > // contents at lastReceive, just check > // the data segment, head/foot/meta > // may be changed later in the stack > if (memcmp (&msg->data, > &lastReceive.data, > TOSH_DATA_LENGTH) != 0){ > printf ("NW STACK CORRUPTION!\n"); > printfflush(); > > dump (&msg->data, TOSH_DATA_LENGTH); > dump (&lastReceive.data, TOSH_DATA_LENGTH); > } > (complete code is at > http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc) > > On Fri, Apr 15, 2011 at 2:51 PM, Miklos Maroti <[email protected]> > wrote: >> Hi Alan! >> >> I can imagine another reason: somehow the packet is corrupted AFTER it >> has passed the hardware CRC check. For example, it is partially >> overwritten by another packet (if that can happen), or a software bug. >> Do you have heavy traffic, or can you observe the problem with 1 >> message / second rate? If so, then it most likely a hardware problem. >> >> Miklos >> >> On Fri, Apr 15, 2011 at 10:46 PM, Alan Marchiori <[email protected]> wrote: >>> Good comments... I finally found another instance of what seems like >>> the same problem here: >>> https://www.millennium.berkeley.edu/pipermail/tinyos-devel/2008-January/002363.html >>> >>> It doesn't sound like it was ever resolved. Other than "you might get >>> a erroneous packet in 50k". This still seems too high for me. >>> >>> To continue investigating this I modified CC2420ReceiveP.nc and >>> changed the first line to the second. This removes the cc2420 CRC >>> check and should pass up any successfully received message regardless >>> of the cc2420 CRC. >>> >>> //if ( ( buf[ rxFrameLength ] >> 7 ) && rx_buf ) { >>> if (rx_buf){ >>> >>> Then I added code in my application to check the CRC in the packet >>> metadata. The interesting thing is after about 500k packet receptions >>> I have not seen a single packet where the CC2420 signaled a CRC error. >>> On the other hand, I did detect about 10 CRC errors in the >>> application-level CRC. >>> >>> cc2420_metadata_t* metadata = call >>> CC2420PacketBody.getMetadata( msg ); >>> if (!metadata->crc){ >>> printf ("HW CRC ERROR!\n"); >>> printfflush(); >>> } >>> >>> My conclusion is that the CC2420 is not actually checking the CRC. I >>> can resolve my problem by simply adding an application-level CRC to my >>> packets, but I think this is a serious issue that needs to be >>> addressed. >>> >>> >>> On Fri, Apr 15, 2011 at 12:23 PM, Michael Schippling <[email protected]> >>> wrote: >>>> Some thoughts.... >>>> >>>> First I would make sendPacket() a task rather than calling >>>> it directly from the two events. I suspect that each of those >>>> events is a task as well -- I know Timer.fired() in T1 is -- >>>> but it makes the partitioning a bit cleaner. >>>> >>>> Second, I wonder if printf() is corrupting something? >>>> The simplest thing would be to just toggle an LED when you >>>> get a bad CRC just to make sure. With a little more pain, >>>> you could implement a separate serial message for reporting. >>>> >>>> One would expect parts of the header and regular CRC to >>>> get corrupted if it was a radio or hardware problem. And >>>> you have no way of counting how many messages fail at that >>>> level. Perhaps add a sourceID and serial number to each >>>> message so you can keep track? >>>> >>>> Otherwise the code looks ok to me... >>>> >>>> MS >>>> >>>> >>>> Alan Marchiori wrote: >>>>>> >>>>>> So it seems that you have multiple base-stations? >>>>>> Or how are you getting log messages from different motes? >>>>> >>>>> there are no base stations or every mote is a base station, depending >>>>> how you view it. It is a testbed configuration where all motes have >>>>> serial forwarders connected and I connect to all of them to receive >>>>> printf messages. Every mote both transmits and receives. >>>>> >>>>>> I have never checked the internal integrity of messages >>>>>> but have also not noticed any glaring bit errors in >>>>>> "normal" use. >>>>>> >>>>>> Anyway...KISS...I would reduce everything to the simplest >>>>>> configuration with the most basic program, no queuing, and >>>>>> probably just the default payload size. Start with one >>>>>> re-mote and one base and then scale up. Try using known >>>>>> message content as well as random. Ramp up the speed. >>>>>> All the usual... It could be that you have a loose pointer >>>>>> in your program that corrupts the message before transmission. >>>>> >>>>> I created a version without queuing and still see similar error rates. >>>>> I then modified the code to send out a constant bit pattern (all >>>>> 0xa55a) and then print out the received values in the case of an >>>>> error, here are the first two I saw after about 60k received packets >>>>> (right around a uint16 rollover....): >>>>> >>>>> 2011-04-14 16:30:59,070 - printf - INFO - (9005): CRC ERROR: 725a b8b8 >>>>> 6ab8 9ce9 15ff 8ba9 7b97 b8de 4e7 7847 714e 7d4e cd43 cd42 cde3 cd43 >>>>> 2011-04-14 16:31:41,620 - printf - INFO - (9005): CRC ERROR: a55a a55a >>>>> a55a a55a a55a a55a 665a 66cc dec6 6756 464e 6655 6666 c66f ee66 d55d >>>>> >>>>> the first one is total garbage, the second is more interesting that it >>>>> starts out as correct (a55a) and then on the 7th byte becomes corrupt. >>>>> >>>>> The new code is at: >>>>> http://thor.mines.edu/trac/attachment/wiki/Projects/Crc/crcAppQueueless.nc >>>>> >>>>> If anyone can see anything wrong with this code, I'd be very >>>>> interested to solve/better understand this problem. >>>>> >>>>> Thanks, >>>>> Alan >>>>> _______________________________________________ >>>>> Tinyos-help mailing list >>>>> [email protected] >>>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help >>>> >>> >>> _______________________________________________ >>> Tinyos-help mailing list >>> [email protected] >>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help >>> >> > _______________________________________________ Tinyos-help mailing list [email protected] https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
