Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-28 Thread Merlin He
Hi Vladimir,
Great help, thank you very much,
We will consider upgrading the kernel.


Vladimir Oltean  于2023年3月28日周二 15:04写道:

> On Tue, Mar 28, 2023 at 01:42:05PM +0800, Merlin He wrote:
> > Richard Cochran  于2023年3月28日周二 11:27写道:
> > > On Tue, Mar 28, 2023 at 10:57:43AM +0800, merlinhe wrote:
> > > > Hi Miroslav,
> > > >
> > > > We use Synopsys's IP, the driver is the same as stmmac
> > >
> > > Yeah, the stmmac driver is crazy bad.
> > >
> > > It is not clear to me whether setting the time when enabling time
> > > stamping is actually required by the IP core or not.
> > >
> > > But even if it were required, still the driver should attempt to keep
> > > the MAC unaltered by reprogramming the current MAC time.
> > >
> >
> > Hi Richard,
> > Thank you, We are going to report this issue to Synopsys
>
> Problem already solved, please use a kernel which tracks linux-stable.
>
> https://github.com/torvalds/linux/commit/a6da2bbb0005e6b4909472962c9d0af29e75dd06
>
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-28 Thread Vladimir Oltean
On Tue, Mar 28, 2023 at 01:42:05PM +0800, Merlin He wrote:
> Richard Cochran  于2023年3月28日周二 11:27写道:
> > On Tue, Mar 28, 2023 at 10:57:43AM +0800, merlinhe wrote:
> > > Hi Miroslav,
> > >
> > > We use Synopsys's IP, the driver is the same as stmmac
> >
> > Yeah, the stmmac driver is crazy bad.
> >
> > It is not clear to me whether setting the time when enabling time
> > stamping is actually required by the IP core or not.
> >
> > But even if it were required, still the driver should attempt to keep
> > the MAC unaltered by reprogramming the current MAC time.
> >
> 
> Hi Richard,
> Thank you, We are going to report this issue to Synopsys

Problem already solved, please use a kernel which tracks linux-stable.
https://github.com/torvalds/linux/commit/a6da2bbb0005e6b4909472962c9d0af29e75dd06


___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-27 Thread Merlin He
Hi Richard,
Thank you, We are going to report this issue to Synopsys

Richard Cochran  于2023年3月28日周二 11:27写道:

> On Tue, Mar 28, 2023 at 10:57:43AM +0800, merlinhe wrote:
> > Hi Miroslav,
> >
> > We use Synopsys's IP, the driver is the same as stmmac
>
> Yeah, the stmmac driver is crazy bad.
>
> It is not clear to me whether setting the time when enabling time
> stamping is actually required by the IP core or not.
>
> But even if it were required, still the driver should attempt to keep
> the MAC unaltered by reprogramming the current MAC time.
>
> Thanks,
> Richard
>
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-27 Thread Richard Cochran
On Tue, Mar 28, 2023 at 10:57:43AM +0800, merlinhe wrote:
> Hi Miroslav,
> 
> We use Synopsys's IP, the driver is the same as stmmac

Yeah, the stmmac driver is crazy bad.

It is not clear to me whether setting the time when enabling time
stamping is actually required by the IP core or not.

But even if it were required, still the driver should attempt to keep
the MAC unaltered by reprogramming the current MAC time.

Thanks,
Richard


___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-27 Thread merlinhe
Hi Miroslav,

We use Synopsys's IP, the driver is the same as stmmac

file: stmicro/stmmac/stmmac_main.c
function: static int stmmac_hwtstamp_set(struct net_device *dev, struct
ifreq *ifr)

 692 /* initialize system time */

* 693 ktime_get_real_ts64();* 694
 695 /* lower 32 bits of tv_sec are safe until y2106 */

* 696 stmmac_init_systime(priv, priv->ptpaddr, 697
(u32)now.tv_sec, now.tv_nsec);*

Miroslav Lichvar  于2023年3月27日周一 16:20写道:

> On Fri, Mar 24, 2023 at 03:46:49PM +0800, merlinhe wrote:
> >
> port.port_initialize()->transport_open()->raw_open()->sk_timestamping_init()->hwts_init()->*ioctl(fd,
> > SIOCSHWTSTAMP)(eth driver set PHC to SYS(year 2000) in this ioctl)*
>
> That ioctl definitely shouldn't cause the PHC to be stepped. What
> HW/driver is it?
>
> --
> Miroslav Lichvar
>
>
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-27 Thread Miroslav Lichvar
On Fri, Mar 24, 2023 at 03:46:49PM +0800, merlinhe wrote:
>  
> port.port_initialize()->transport_open()->raw_open()->sk_timestamping_init()->hwts_init()->*ioctl(fd,
> SIOCSHWTSTAMP)(eth driver set PHC to SYS(year 2000) in this ioctl)*

That ioctl definitely shouldn't cause the PHC to be stepped. What
HW/driver is it?

-- 
Miroslav Lichvar



___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-24 Thread merlinhe
Hi Miroslav,

Thank you very much for your great help

After debugging for a few days, I found that it was the ethernet driver
that set PHC to SYS when processing ioctl.
we modified the driver and the problem fixed.

add some additional descriptions to this problem below:

ptp4l slave config(--step_threshold=0.0, --max_frequency=100)
before sync, slave_SYS(year 2000), master ts(year 2019)

ptp4l slave side events:
0  process SYNC
1  send PDELAY_REQ
2  process FUP
3  sync PHC to master(set PHC to year 2019, but SYS is still year 2000)
3  clock.servo.state = JUMP
4  set PDELAY_REQ to NULL in port_synchronize()
5  process PDELAY_RESP(bc_event return -1)
6  set port.state = FAULTY
7
 
port.port_initialize()->transport_open()->raw_open()->sk_timestamping_init()->hwts_init()->*ioctl(fd,
SIOCSHWTSTAMP)(eth driver set PHC to SYS(year 2000) in this ioctl)*
8  port processes other msgs after init, but servo transfer to LOCK state
when next SYN/FUP come, PHC can only set max_frequency(100)


Miroslav Lichvar  于2023年3月21日周二 15:56写道:

> On Tue, Mar 21, 2023 at 10:17:56AM +0800, merlinhe wrote:
> > 1. after the first clock jumpping, PHC jumpped to master, SYS keep the
> > original value
> > 2. peer_delay_req set to NUL
> > 3. when the next peer_delay_resp arrives, the port enters the faulty
> state,
> > which cause the port reinited(PHC reset to SYS)
>
> By phc2sys? System clock shouldn't be used at all with HW
> timestamping.
>
> > 4. then the next SYNC, FUP arrives, the clock.servo enter LOCK state,
> >
> > Do you have any suggestions to fix this problem? Or do I need to change
> my
> > ptp4l configuration?
>
> It's not very clear to me what is the issue that you are trying to
> avoid.
>
> --
> Miroslav Lichvar
>
>
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-21 Thread Miroslav Lichvar
On Tue, Mar 21, 2023 at 10:17:56AM +0800, merlinhe wrote:
> 1. after the first clock jumpping, PHC jumpped to master, SYS keep the
> original value
> 2. peer_delay_req set to NUL
> 3. when the next peer_delay_resp arrives, the port enters the faulty state,
> which cause the port reinited(PHC reset to SYS)

By phc2sys? System clock shouldn't be used at all with HW
timestamping.

> 4. then the next SYNC, FUP arrives, the clock.servo enter LOCK state,
> 
> Do you have any suggestions to fix this problem? Or do I need to change my
> ptp4l configuration?

It's not very clear to me what is the issue that you are trying to
avoid.

-- 
Miroslav Lichvar



___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-20 Thread merlinhe
Hi Miroslav,

Thank you very much for your clear explanation.
 I now understand why peer_delay_req needed to be set to NULL when clock
jumping(to avoid the corruption of delay measurement)

But I have another problem now,
With ptp4l.max_frequency limited to 1,000,000(to avoid PHC big jumpping
after first offset setting)
When this occured, PHC can only approching master very slowly(the clock
servo enters LOCK state)

1. after the first clock jumpping, PHC jumpped to master, SYS keep the
original value
2. peer_delay_req set to NUL
3. when the next peer_delay_resp arrives, the port enters the faulty state,
which cause the port reinited(PHC reset to SYS)
4. then the next SYNC, FUP arrives, the clock.servo enter LOCK state,

Do you have any suggestions to fix this problem? Or do I need to change my
ptp4l configuration?

thank you

Miroslav Lichvar  于2023年3月15日周三 16:03写道:

> On Wed, Mar 15, 2023 at 03:48:41PM +0800, merlinhe wrote:
> > Hello Team,
> >
> > I recently encountered a *rogue peer delay response* error which seemly
> > caused by function port_synchronize() reset peer_delay_req pointer.
> > I'd like to know why port_synchronize() reset p->peer_delay_req to NULL
> is
> > needed, can i comment this to avoid the rogue peer delay response error?
>
> The clock jumping in the middle of a peer delay measurement corrupts
> the result. Pretending there was no request is a simple solution to
> avoid accepting the response, although the error message is confusing.
>
> --
> Miroslav Lichvar
>
>
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


Re: [Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-15 Thread Miroslav Lichvar
On Wed, Mar 15, 2023 at 03:48:41PM +0800, merlinhe wrote:
> Hello Team,
> 
> I recently encountered a *rogue peer delay response* error which seemly
> caused by function port_synchronize() reset peer_delay_req pointer.
> I'd like to know why port_synchronize() reset p->peer_delay_req to NULL is
> needed, can i comment this to avoid the rogue peer delay response error?

The clock jumping in the middle of a peer delay measurement corrupts
the result. Pretending there was no request is a simple solution to
avoid accepting the response, although the error message is confusing.

-- 
Miroslav Lichvar



___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users


[Linuxptp-users] rogue peer delay response caused by port_synchronize()

2023-03-15 Thread merlinhe
Hello Team,

I recently encountered a *rogue peer delay response* error which seemly
caused by function port_synchronize() reset peer_delay_req pointer.
I'd like to know why port_synchronize() reset p->peer_delay_req to NULL is
needed, can i comment this to avoid the rogue peer delay response error?
Can anyone please help me?

the problematic packet timing is as follows

   1. client send pdelay_req
   2. client recv follow_up
   3. client enter port_synchronize, client clock servo enter jump state
   4. port_synchronize reset p->peer_delay_req to NULL
   5. client recv pdelay_resp
   6. client report rogue peer delay response
   7. client reinit port
   8. client servo enter lock state,

ptp4l log as follows,

* 202 ptp4l[3.143]: port 1: delay timeout* 203 ptp4l[3.147]: bc_event,
2613, idx(1)
 204 ptp4l[3.147]: process msg seq_id(16945)
 205 ptp4l[3.147]: bc_event, 2761, process FOLLOW_UP
 206 ptp4l[3.147]: port_syfufsm, 1251, sts(1), event(3)
 207 ptp4l[3.147]: port_synchronize, 1176, t1(1560257347787048846),
t2(946684803056837859), c1(0), c2(0)
 208 ptp4l[3.147]: port_synchronize, 1192, last_state(0), state(1)
* 209 ptp4l[3.147]: port_synchronize, 1212, peer_delay_req cleared*
 210 ptp4l[3.147]: bc_event, 2613, idx(0)
 211 ptp4l[3.147]: process msg seq_id(2)
 212 ptp4l[3.147]: bc_event, 2754, process PDELAY_RESP
* 213 ptp4l[3.147]: port 1: rogue peer delay response*
 214 ptp4l[3.147]: bc_event, 2757, process PDELAY_RESP failed
 215 ptp4l[3.148]: port 1: clearing fault immediately

*1158 static void port_synchronize(*)
1206   case SERVO_JUMP:
1207 port_dispatch(p, EV_SYNCHRONIZATION_FAULT, 0);
1208 flush_delay_req(p);




*1209 if (p->peer_delay_req) {1210
 msg_put(p->peer_delay_req);1211 p->peer_delay_req = NULL;1212
   pr_notice("%s, %d, peer_delay_req cleared", __FUNCTION__, __LINE__);1214
}*

Best Regards
Merlin
___
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users