Hi Xiaohui, Very nice detective work. I think I understand the problem and I understand your solution. FTSP tries to maintain the continuity of time when switching between roots, that is why the root keeps the previous offset and skew values (and does not update them). You however essentially reset the offset to 0 and skew to 1, so there can be an abrupt change. This becomes problematic if two nodes switch being the root from time to time (e.g. one of them has bad connectivity).
I think the original algorithm has its problems, but so does your proposed solution. I think the large skew values are the result of very unstable clocks at startup, no? What if you delay the start of the FTSP by a few seconds? An alternative solution would be to slowly decrease the skew of the root when it recalculates its base point. That would slowly eliminate the large skews. Miklos On Fri, Jul 13, 2012 at 2:32 PM, Xiaohui Liu <[email protected]> wrote: > Hi, > > Finally, I found out why this can happen. > > Suppose A is the node with smallest id in the network and is going to be the > sync root eventually. At the beginning, it boots up later than some nodes, > which become roots prior to A. A few reference points from these roots make > into A's regression table. Because ftsp is still in the process of > convergence, the skew based on these few entries can be abnormal. What's > worse, root broadcasts this skew even if the number of entries are less than > ENTRY_SEND_LIMIT as shown below. > if( numEntries < ENTRY_SEND_LIMIT && outgoingMsg->rootID != > TOS_NODE_ID ){ > ++heartBeats; > state &= ~STATE_SENDING; > } > else if( call Send.send(AM_BROADCAST_ADDR, &outgoingMsgBuffer, > TIMESYNCMSG_LEN, localTime ) != SUCCESS ){ > state &= ~STATE_SENDING; > signal TimeSyncNotify.msg_sent(); > } > Gradually, every node synchronizes with root and gets the same abnormal > skew. This is because local clock rates of all nodes are very close with > some small skew (tens out of a million in my tests) and after sync, the > global clocks keep pace with the root's local clock. > > I fixed this by making sure root's global time is always equal to its local > time as follows, regardless of the skew computed from regression table: > async command error_t GlobalTime.local2Global(uint32_t *time) > { > if (outgoingMsg->rootID != TOS_NODE_ID) > *time += offsetAverage + (int32_t)(skew * (int32_t)(*time - > localAverage)); > return is_synced(); > } > I repeated my tests 6 times and ftsp runs like a charm now. The abnormal > skew never shows up, even root's skew based on regression table can still be > large. > > On Thu, Jul 12, 2012 at 10:01 PM, Xiaohui Liu <[email protected]> wrote: >> >> Hi, >> >> I test ftsp alone and find some abnormal skews. One instance is skew of >> 23% and the corresponding regression table is: >> localtime offset >> 5327769 1120379 >> 5982298 1271252 >> 5888794 1249698 >> 5795290 1228145 >> 5722274 1211314 >> 5608281 1185039 >> 5514777 1163486 >> 5421273 1141932 >> The regression figure is here. Almost every of the 130 nodes is returning >> skew of 23% ! >> >> Another is 183%, regression table: >> 1474355 450851 >> 1376050 277030 >> 1327295 -9142 >> 1229783 43734 >> >> Any hint on where the issue may arise from, packet-level synchronization >> or packet timestamping? Thanks for your attention. >> >> >> On Tue, Jul 10, 2012 at 4:29 PM, Eric Decker <[email protected]> wrote: >>> >>> It is very difficult to tell what is going on from the outside. >>> >>> what I've done with complicated situations like this is add >>> instrumentation (logging) of various state in nodes. >>> >>> that is what I would recommend. I don't see anyway around it. >>> >>> >>> On Tue, Jul 10, 2012 at 1:16 PM, Xiaohui Liu <[email protected]> wrote: >>>> >>>> Hi everyone, >>>> >>>> I'm using FTSP of T32khz accuracy (i.e., TimeSync32kC) to synchronize a >>>> network consisting of 130 telosB motes. Multiple runs are tested and skews >>>> after synchronization are recorded. In some runs, the skew is incredibly >>>> large, up to 56%. In others, it is below 0.0045%, which can be regarded as >>>> normal. Does anyone know what might be causing this issue? One observation >>>> is that regardless of the skew in a run, all nodes report almost identical >>>> skews. >>>> >>>> Your reply will be greatly appreciated. >>>> >>>> -- >>>> -Xiaohui Liu >>>> >>>> _______________________________________________ >>>> Tinyos-help mailing list >>>> [email protected] >>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help >>> >>> >>> >>> >>> -- >>> Eric B. Decker >>> Senior (over 50 :-) Researcher >>> >>> >> >> >> >> -- >> -Xiaohui Liu > > > > > -- > -Xiaohui Liu _______________________________________________ Tinyos-help mailing list [email protected] https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
