Hi Xiaohui,

Very nice detective work. I think I understand the problem and I
understand your solution. FTSP tries to maintain the continuity of
time when switching between roots, that is why the root keeps the
previous offset and skew values (and does not update them). You
however essentially reset the offset to 0 and skew to 1, so there can
be an abrupt change. This becomes problematic if two nodes switch
being the root from time to time (e.g. one of them has bad
connectivity).

I think the original algorithm has its problems, but so does your
proposed solution. I think the large skew values are the result of
very unstable clocks at startup, no? What if you delay the start of
the FTSP by a few seconds? An alternative solution would be to slowly
decrease the skew of the root when it recalculates its base point.
That would slowly eliminate the large skews.

Miklos

On Fri, Jul 13, 2012 at 2:32 PM, Xiaohui Liu <[email protected]> wrote:
> Hi,
>
> Finally, I found out why this can happen.
>
> Suppose A is the node with smallest id in the network and is going to be the
> sync root eventually. At the beginning, it boots up later than some nodes,
> which become roots prior to A. A few reference points from these roots make
> into A's regression table. Because ftsp is still in the process of
> convergence, the skew based on these few entries can be abnormal. What's
> worse, root broadcasts this skew even if the number of entries are less than
> ENTRY_SEND_LIMIT as  shown below.
>         if( numEntries < ENTRY_SEND_LIMIT && outgoingMsg->rootID !=
> TOS_NODE_ID ){
>             ++heartBeats;
>             state &= ~STATE_SENDING;
>         }
>         else if( call Send.send(AM_BROADCAST_ADDR, &outgoingMsgBuffer,
> TIMESYNCMSG_LEN, localTime ) != SUCCESS ){
>             state &= ~STATE_SENDING;
>             signal TimeSyncNotify.msg_sent();
>         }
> Gradually, every node synchronizes with root and gets the same abnormal
> skew. This is because local clock rates of all nodes are very close with
> some small skew (tens out of a million in my tests) and after sync, the
> global clocks keep pace with the root's local clock.
>
> I fixed this by making sure root's global time is always equal to its local
> time as follows, regardless of the skew computed from regression table:
>     async command error_t GlobalTime.local2Global(uint32_t *time)
>     {
>          if (outgoingMsg->rootID != TOS_NODE_ID)
>             *time += offsetAverage + (int32_t)(skew * (int32_t)(*time -
> localAverage));
>         return is_synced();
>     }
> I repeated my tests 6 times and ftsp runs like a charm now. The abnormal
> skew never shows up, even root's skew based on regression table can still be
> large.
>
> On Thu, Jul 12, 2012 at 10:01 PM, Xiaohui Liu <[email protected]> wrote:
>>
>> Hi,
>>
>> I test ftsp alone and find some abnormal skews. One instance is skew of
>> 23% and the corresponding regression table is:
>> localtime          offset
>> 5327769  1120379
>> 5982298        1271252
>> 5888794         1249698
>> 5795290         1228145
>> 5722274         1211314
>> 5608281         1185039
>> 5514777         1163486
>> 5421273         1141932
>> The regression figure is here. Almost every of the 130 nodes is returning
>> skew of 23% !
>>
>> Another is 183%, regression table:
>> 1474355  450851
>> 1376050  277030
>> 1327295  -9142
>> 1229783  43734
>>
>> Any hint on where the issue may arise from, packet-level synchronization
>> or packet timestamping? Thanks for your attention.
>>
>>
>> On Tue, Jul 10, 2012 at 4:29 PM, Eric Decker <[email protected]> wrote:
>>>
>>> It is very difficult to tell what is going on from the outside.
>>>
>>> what I've done with complicated situations like this is add
>>> instrumentation (logging) of various state in nodes.
>>>
>>> that is what I would recommend.   I don't see anyway around it.
>>>
>>>
>>> On Tue, Jul 10, 2012 at 1:16 PM, Xiaohui Liu <[email protected]> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm using FTSP of T32khz accuracy (i.e., TimeSync32kC) to synchronize a
>>>> network consisting of 130 telosB motes. Multiple runs are tested and skews
>>>> after synchronization are recorded. In some runs, the skew is incredibly
>>>> large, up to 56%. In others, it is below 0.0045%, which can be regarded as
>>>> normal. Does anyone know what might be causing this issue? One observation
>>>> is that regardless of the skew in a run, all nodes report almost identical
>>>> skews.
>>>>
>>>> Your reply will be greatly appreciated.
>>>>
>>>> --
>>>> -Xiaohui Liu
>>>>
>>>> _______________________________________________
>>>> Tinyos-help mailing list
>>>> [email protected]
>>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>>
>>>
>>>
>>>
>>> --
>>> Eric B. Decker
>>> Senior (over 50 :-) Researcher
>>>
>>>
>>
>>
>>
>> --
>> -Xiaohui Liu
>
>
>
>
> --
> -Xiaohui Liu
_______________________________________________
Tinyos-help mailing list
[email protected]
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Reply via email to