Re: [RTnet-users] synchronisation problems

Jan Kiszka Wed, 03 Oct 2007 07:58:06 -0700

Karl Reichert wrote:
> Jan Kiszka wrote:
>> Karl Reichert wrote:
>>> Karl Reichert wrote:
>>>> Karl Reichert wrote:
>>>>> Karl Reichert wrote:
>>>>>> Karl Reichert wrote:
>>>>>>> Karl Reichert wrote:
>>>>>>>> Jan Kiszka wrote:> 
>>>>>>>>> What I would analyse if I were you:
>>>>>>>>>  - Is the request frame sent in the right slot according to the
>>>>>>> sender?
>>>>>>>> Well, now I have another weird behavior. The slave sends a request
>>>>>>>> calibration frame in cycle no 45849 and sets Reply Cycle Number to
>>>>>>> 97655. As I'm
>>>>>>>> using a cycle length of 5 ms, this means the slave wants the
>>>> answer
>>>>>> more
>>>>>>>> then 4 minutes in the future, which is for sure to far away!
>>>>>>>>
>>>>>>>> This behavior and the one observed before points to a problem when
>>>>>>>> calculating or setting this reply cycle number. How can one
>>>>> configure
>>>>>>> how big this
>>>>>>>> value is set (how big the offset between request and reply should
>>>>> be).
>>>>>>> If
>>>>>>>> this is done automatically, in which function is it done?
>>>>>>> I digged deeper into the code and here are the results. Please see
>>>>>>> attached file to see me changes. Every change is marked by /* REK
>>>>> debug
>>>>>> */:
>>>>>>> My first step was to go into module tdma_worker.c, function
>>>>>>> do_request_cal_job(). I printed the values of tdma->current_cycle
>>>> and
>>>>>> job->period.
>>>>>>> job->period is always 1, in all calls of this function.
>>>>>> tdma->current_cycle of
>>>>>>> course changes, but it holds an old value, for example 1521 cycles
>>>> ago
>>>>>> with 1
>>>>>>> ms cycle length.
>>>>>>>
>>>>>>> As a quick hack, my second step was to substitue job->period with an
>>>>>>> constant value, 2500 in my case. Now the synchronisation works, of
>>>>>> course
>>>>>>> slowly, but it does.
>>>>>>>
>>>>>>> The error must be the "wrong" value of on of those two variables!
>>>>>>>
>>>>>>> This leads to a few questions:
>>>>>>> 1) Is it right that job->period is always 1?
>>>>>>> 2a) If yes (what I do not believe), do you have an idea why
>>>>>>> tdma->current_cycle holds an "old" value?
>>>>>>> 2b) If not, where is this value calculated and where for?
>>>>>>>
>>>>>> Hello list!
>>>>>>
>>>>>> I did some further testing and this is what I found out:
>>>>>>
>>>>>> 1) tdma->current_cycle holds the right value! I set the tdma cycle
>>>>> length
>>>>>> to 10s (yes, seconds) and also started wireshark on the master. I
>>>>> checked
>>>>>> the value of this variable and when I compare it to what wireshark
>>>> tells
>>>>> me
>>>>>> about the sent sync frames, I see that this value is correct.
>>>>>>
>>>>>> 2) The slave is able to sync with this long cycle length. The reason
>>>> for
>>>>>> this is that job->period is still always 1, but now there is enough
>>>> time
>>>>>> (nearly 10s) to send the req cal frm.
>>>>>>
>>>>>>
>>>>>> This leads me to the following conclusions and questions:
>>>>>>
>>>>>> 1) Again ... why is job->period always 1? Is this correct? I still
>>>> don't
>>>>>> understand where this variable stands for and why we do not simply
>> add
>>>>> any
>>>>>> other constant value (like 1 or sth bigger).
>>>>>>
>>>>>> 2) If it is correct that job->period contains always 1, is it
>>>>> problematic
>>>>>> to set any higher value there (like I did in my quick hack, see last
>>>>> mail)?
>>>>>> If yes, wouldn't it make sense to make this configurable (make
>>>>>> menuconfig)?
>>>>>>
>>>>>> 3) If it has to be always one, running tdma with cycle length 1ms
>>>> would
>>>>>> mean, that the slave must be able to send a req cal frm after a sync
>>>> frm
>>>>>> within less then 1ms. Is this to much for a high end Core 2 Duo?
>> Don't
>>>>> think so
>>>>>> ...
>>>>>>
>>>>> After checking the sourcecode again and playing around with tdmacfg -p
>>>>> parameter, I see now the meaning of this job->period variable. When
>> one
>>>>> provides -p 1/2 this variable holds the value 2, so that the frame is
>>>> sent only
>>>>> in every second period. So it makes sense that it normally holds 1 in
>> my
>>>>> case.
>>>>>
>>>>> So, what I see is the following:
>>>>> Without changing the sources, it isn't possible to set any offset
>>>> between
>>>>> received sync frm and sent req cal frm. The TDMA module in slave mode
>>>>> always sents the req cal frm right after receiving the sync frm (of
>>>> course in
>>>>> it's timeslot). It sets the "reply cycle number" field in this frame
>>>> always
>>>>> to the next cycle number. This behavior is hard coded and not
>>>> configurable
>>>>> via make menuconfig.
>>>>>
>>>>> As my slave isn't able to send the req cal frm so fast, it isn't able
>> to
>>>>> synchronize when a short tdma cycle length is configured (1ms). Is
>> this
>>>> a
>>>>> problem only of my station?
>>>>>
>>>>> Why is it not possible to configure those offsets?
>>>>>
>>>>> Please advice!
>>>>> Thanks ... Karl
>>>> To keep you up-to-date, just in case anyone cares ;)
>>>>
>>>> With the dirty hack, after synchornization is finished, sending data
>>>> frames every 1ms works fine. So my machine isn't to slow at all, just
>> the
>>>> do_request_cal_job() seems to be problematic. I will dig deeper into
>> this now
>>>> with I-pipe Tracer and give further information when available.
>>>>
>>> Please see attached files from i-pipe debugging! I put an
>> xntrace_user_freeze(0,1) before rtdm_task_sleep_until(...) in function
>> do_request_cal_job()
>>> Any ideas?
>> That trace per-se doesn't help. It would rather be interesting to check
>> how log the caller actually slept in the end (ie. put the freeze after
>> the sleep), or what was passed to the caller (printk or
>> ipipe_trace_special instrumentation).
>>
>> Jan
>>
> I put the freeze after the sleep now, please see attached files. This is what 
> printk gives:
> 
> [ 7605.737664] [REK debug] tdma->current_cycle_start = 1191326103544417349
> [ 7605.737708] [REK debug] job->offset = 2300000


So you sleep about 2300 us after the Sync frame reception. But your
frozen backtrace doesn't cover the full period, just look at the
timestamps. Once you can see the full period of 2300 us from falling
asleep until waking up and maybe also sending the packet (play with
back_trace_points and post_trace_points), you can be sure that the local
timing is correct. Then you will have to look at the data again that is
received and transmitted.

Jan

signature.asc
Description: OpenPGP digital signature

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

_______________________________________________
RTnet-users mailing list
RTnet-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rtnet-users

Re: [RTnet-users] synchronisation problems

Reply via email to