Hello,

I believe I've figured out what is causing the TDC errors. I (or one of my 
coworkers) will created an "issue" in the UHD github repo, but I wanted to post 
some more info here if someone else runs into this.

I found that I could reproduce the TDC measurement errors at least somewhat 
consistently with the following command:

while true; do uhd_usrp_probe 
--args="force_reinit=1,master_clock_rate=200000000"; done

I don't think the master clock rate matters -- that is just what I selected, 
but the force_reinit forces the clocks to get setup each time. That was the 
important part in reproducing the error. If I let this run while the UHD 
4.1.0.4 or prior filesystem is installed on an N320, I have not been able to 
reproduce the TDC error ever. Using the filesystem from UHD 4.1.0.5-rc1 and 
later and running the above command results in ocassional TDC errors. They 
occur randomly, but if I leave it running, I'll usually see at least a few per 
hour. I tended to leave it running over night and I'd check for errors the next 
morning.

It looks like the problem is related to a change made in the LMK04848 
configuration in MPM. In UHD commit d7ee3dcf4a7478a17a094a1be2cba37b98843963, 
it looks like some register writes were changed to decrease PLL lock time. It 
looks like these registers set the amount of time that the phase detector error 
must be within a certain window before Lock Detect is asserted. I'm guessing 
that the reduction in time required to declare lock (number of clock cycles) 
might be too aggressive. It works most of the time, but not always.

Making the following edits to 
/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/lmk_rh.py seems to fix 
the issue. Note that this file must be edited on the N320.

Replace:
            (0x15B, 0xC7), # PLL1 PFD: negative slope for active filter / CP = 
750 uA
            (0x15C, 0x0F), # PLL1 DLD Count [13:8]

With:
            (0x15B, 0x27), # PLL1 PFD: negative slope for active filter / CP = 
750 uA
            (0x15C, 0x10), # PLL1 DLD Count [13:8]

This just undoes the change made in the commit mentioned above and requires 
more time before the LMK04848 to declares lock. Maybe some value in between 
would be a better choice, but I'm leaving it this way for now. I haven't seen 
any TDC errors so far.

Thanks,
Jim

________________________________
From: Jim Palladino <j...@gardettoengineering.com>
Sent: Tuesday, May 10, 2022 2:02 PM
To: Marcus D. Leech <patchvonbr...@gmail.com>; USRP-users@lists.ettus.com 
<usrp-users@lists.ettus.com>
Subject: [USRP-users] Re: N320 TDC measurement errors

Just passing on that I updated an N320 to UHD 4.2.0.0 and ran into the TDC 
error pretty quickly. I now reverted that radio to 4.1.0.2 and have not seen 
that error "yet".

Thanks,
Jim

________________________________
From: Jim Palladino <j...@gardettoengineering.com>
Sent: Monday, May 9, 2022 1:08 PM
To: Marcus D. Leech <patchvonbr...@gmail.com>; usrp-users@lists.ettus.com 
<usrp-users@lists.ettus.com>
Subject: [USRP-users] Re: N320 TDC measurement errors

Thanks, Marcus. I cannot say with 100% certainty, but we had most radios on UHD 
4.1.0.2 before and nobody here remembers seeing those errors (ever) until we 
updated all of them to 4.1.0.5. There have always been issues (according to the 
others I talked to) with radios not starting properly with some odd error or 
another that would magically go away with the next attempt. It could be that 
some of those errors were related to this problem and were presented to the 
user differently, but I can't say for sure.  If I get a free N320 at some 
point, I might try reverting it to 4.1.0.2 and keep an eye on its behavior.

Thanks
Jim

________________________________
From: Marcus D. Leech <patchvonbr...@gmail.com>
Sent: Monday, May 9, 2022 12:04 PM
To: usrp-users@lists.ettus.com <usrp-users@lists.ettus.com>
Subject: [USRP-users] Re: N320 TDC measurement errors

On 2022-05-09 11:32, Jim Palladino wrote:
Sorry to bring it up again, but this is really becoming an issue for us, in 
that we can't seem to use our N320 radios reliably with this TDC measurement 
error issue. When the TDC error occurs, our program or even uhd_usrp_probe 
immediately errors out and exits. If anyone has seen this or has any thoughts 
on why this might be happening or how to fix it, that would be greatly 
appreciated.

Thanks,
Jim
Jim:

I'm sorry this is happening to your N320s.   Can you confirm that it DOES NOT 
happen on previous releases?  I don't have an N320 here to test with.

I've rattled some internal Ettus/NI cages, but I cannot offer a concrete 
response time.



________________________________
From: Jim Palladino 
<j...@gardettoengineering.com><mailto:j...@gardettoengineering.com>
Sent: Monday, May 2, 2022 12:59 PM
To: USRP-users@lists.ettus.com<mailto:USRP-users@lists.ettus.com> 
<usrp-users@lists.ettus.com><mailto:usrp-users@lists.ettus.com>
Subject: [USRP-users] N320 TDC measurement errors

Hello,

Ever since updating to UHD 4.1.0.5 (including updating the filesystem and FPGA 
image on our six N320 USRPs), we occasionally get TDC measurement errors when 
trying to interact with the radio via UHD. It isn't easily reproducible, but it 
does happen on different radios maybe once a day or so. I've seen it when using 
either external time and clock sources or internal (doesn't seem to matter 
which).

Here is an example of the output of a uhd_usrp_probe when this occurs.
----------------------
[INFO] [UHD] linux; GNU C++ version 7.5.0; Boost_106501; 
UHD_4.1.0.HEAD-0-g6bd0be9c
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: 
mgmt_addr=192.168.40.2,type=n3xx,product=n320,serial=31EDED4,fpga=XG,claimed=False,addr=192.168.40.2
[DEBUG] [MPMD] Claiming mboard 0
[DEBUG] [MPMD] Device args: 
`mgmt_addr=192.168.40.2,type=n3xx,product=n320,serial=31EDED4,fpga=XG,claimed=False,addr=192.168.40.2'.
 RPC address: 192.168.40.2
[DEBUG] [MPMD] MPM reports device info: 
addr=192.168.30.2,claimed=True,connection=remote,dboard_0_pid=338,dboard_0_serial=31EBB6F,dboard_1_pid=338,dboard_1_serial=31EBB94,description=N300-Series
 
Device,eeprom_version=3,fpga=XG,fpga_version=8.0,fpga_version_hash=6bd0be9.clean,fs_version=20211215135436,mender_artifact=v4.1.0.5_n3xx,mpm_sw_version=4.1.0.5-g6bd0be9c,mpm_version=4.0,name=ni-n3xx-31EDED4,pid=16962,product=n320,rev=10,rpc_connection=remote,second_addr=192.168.40.2,serial=31EDED4,type=n3xx
[DEBUG] [MPMD] Found 8 motherboard sensors.
[DEBUG] [MPMD] Initializing mboard 0
[INFO] [MPM.PeriphManager] init() called with device args 
`fpga=XG,mgmt_addr=192.168.40.2,product=n320,clock_source=internal,time_source=internal'.
[INFO] [MPM.Rhodium-0] init() called with args 
`fpga=XG,mgmt_addr=192.168.40.2,product=n320,clock_source=internal,time_source=internal'
[INFO] [MPM.Rhodium-1] init() called with args 
`fpga=XG,mgmt_addr=192.168.40.2,product=n320,clock_source=internal,time_source=internal'
[INFO] [MPM.Rhodium-0.init.LMK04828] LMK initialized and locked!
[ERROR] [MPM.Sync-0] TDC measurements show a wide range of values! Check your 
clock rates for incompatibilities.
[INFO] [MPM.Rhodium-1.init.LMK04828] LMK initialized and locked!
[ERROR] [RPC] TDC measurement out of expected range!
[INFO] [MPM.Rhodium-1.DAC37J82] DAC PLL Locked!
[INFO] [MPM.Rhodium-1.AD9695] ADC PLL Locked!
[INFO] [MPM.Rhodium-1.init] JESD204B Link Initialization & Training Complete
[ERROR] [MPM.RPCServer] init() failed with error: TDC measurement out of 
expected range!
Error: RuntimeError: Error during RPC call to `init'. Error message: TDC 
measurement out of expected range!
----------------------

If I run uhd_usrp_probe again immediately, it always seems to work fine. I 
don't think this is specific to any of the 3 valid master clock rates, but I've 
seen this happen after a fresh reboot of an N320 with a uhd_usrp_probe -- so it 
should have been set to default parameters. I also feel like it happens after a 
radio hasn't been in use for a while, but I'm not sure if that is always the 
case.

Does anyone have any idea what might cause this?

Thanks,
Jim




_______________________________________________
USRP-users mailing list -- 
usrp-users@lists.ettus.com<mailto:usrp-users@lists.ettus.com>
To unsubscribe send an email to 
usrp-users-le...@lists.ettus.com<mailto:usrp-users-le...@lists.ettus.com>


_______________________________________________
USRP-users mailing list -- usrp-users@lists.ettus.com
To unsubscribe send an email to usrp-users-le...@lists.ettus.com

Reply via email to