Re: [USRP-users] N3xx operational questions

Martin Braun via USRP-users Fri, 02 Nov 2018 14:54:37 -0700

On Wed, Oct 24, 2018 at 3:02 PM EJ Kreinar via USRP-users <
usrp-users@lists.ettus.com> wrote:


> Hi All,
>
> I've been working with the N3xx series for a week or so and I've hit a few
> issues in the "operational" side of things that are either not addressed in
> the manual or work differently than expected. I'll just handle this as a
> "laundry list" of items/issues...
>
> To start, I've been directly testing on the N300 so far. I reflashed the
> N300 SD card with version UHD-3.13.02 last week.
>
> 1. As a heads up, and I'm sure you're aware, I've had a fair bit of
> trouble coordinating the uhd_images_downloader with the correct images...
> First, when I originally built UHD-3.13.0.0 as described in the manual,
> there's no images provided for 3.13.0.0. I just noticed today that if I
> update to UHD-3.13 branch (currently 3.13.0.3) and run the downloader, it
> tries to pull images for 3.13.0.2, which is likely incompatible with the
> 3.13.0.3 head (noc shell seems to be updated to version 3 in the most
> recent 3.13.0.3).
>

EJ, the FPGA images tagged in the manifest in the UHD repo should always
work with that version of UHD, meaning that uhd_images_downloader should
always pull the correct images. The latest filesystem might not contain the
latest FPGA images -- then you simply run uhd_images_downloader. This gives
us the option to more incrementally update our filesystems (and your
downloads), and we don't need to update all the Gigabytes of
filesystem-related files. Note that we did remove the 3.13.0.0 images after
the following release, because it had some issues with the newest hardware
revision (which have been resolved).

A combo of

    uhd_images_downloader -t n3xx -t sdimg
    uhd_images_downloader -t n3xx -t fpga

will always give you up-to-date images.


>
> 2. I keep running into a number of issues trying to download new FPGA
> images. Could someone explain the mechanics of FPGA loading for N3xx? I'm
> assuming this is similar to X310, however I would have expected that a
> zynq-based platform would simply program to /dev/xdevcfg like the E310.
>

Yep, it's the same! The N3xx manual has some notes (see Section "Updating
the FPGA"). You can omit --fpga-path for a default image.

>
> More tactically, I have a few issues when trying to download FPGA images.
> If I run a "host mode update", then I get an unexpected error:
>
> ```
> $ uhd_image_loader --args "type=n3xx,addr=10.1.151.245"
> --fpga-path=n3xx.bit
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_3.13.0.3-14-gd1555232
> [INFO] [MPMD] Initializing 1 device(s) in parallel with args:
> mgmt_addr=10.1.151.245,type=n3xx,product=n300,serial=3145FF4,claimed=False,skip_init=1
> [INFO] [MPMD] Claimed device without full initialization.
> [WARNING] [MPMD IMAGE LOADER] RuntimeError: Component file does not exist:
> /home/ejk/fpga-build/he360-fpga-builder/src/uhd-fpga/usrp3/top/n3xx/build-N300_RFNOC_HG/n3xx.dts
> [INFO] [MPMD IMAGE LOADER] Starting update. This may take a while.
> [ERROR] [UHD] An unexpected exception was caught in a task loop.The task
> loop will now exit, things may not work.rpc::timeout: Timeout of 5000ms
> while calling RPC function 'get_log_buf'
> [INFO] [MPM.PeriphManager] Device serial number: 3145FF4
> [INFO] [MPM.PeriphManager] Initialized 1 daughterboard(s).
> [INFO] [MPM.PeriphManager] init() called with device args `'.
> Error: rpc::timeout: Timeout of 20000ms while calling RPC function
> 'update_component'
> ```
>
> Fortunately, this error appears that it doesnt materially impact anything,
> and I can now probe the FPGA successfully afterwards.
>

Hm, that shouldn't happen -- but it's not a big deal. What's happening is
that reloading the FPGA image will take down the network device, and then
the connection to the RPC server goes away, which causes a timeout.

>
> When trying to load FPGA images in embedded mode, first, I get an error
> when running uhd_images_downloader that suggests argparse is not installed
> into the rootfs:
>
> ```
> root@ni-n3xx-3145FF4:~# uhd_images_downloader
> Traceback (most recent call last):
>   File "/usr/bin/uhd_images_downloader", line 11, in <module>
>     import argparse
> ImportError: No module named argparse
> ```
>
> Oddly, pip3 seems to be installed, and I can run `pip3 install argparse`
> but there's no pip or python2 argparse :(
>

Hm, that's a bug. Everything should be using Python3. You can do `python3
/usr/bin/uhd_image_loader` to skip the error. However, are you saying
you're also getting the timeout? Which address are you supplying? localhots?


>
> Anyway, if I scp a relevant image onto the N300 PS, I get a similar issue
> as over network mode.
>
> At one point, I found fpga programming in embedded mode crashed the N300
> and require a hard reboot, but I cant recreate that right now so I'll leave
> that off my list.
>

> 3. Embedded mode vs host/network mode: Ideally, I would like to run the
> N3xx using both a high rate ethernet connection through the sfp's, and a
> connection to the Zynq PS over the RJ45. (not at the same time from the
> same program, but both ports physically connected)... However, I cannot
> succeed in switching between embedded mode and host mode without 1)
> physically unplugging the ethernet cables, or 2) taking down the IP
> interface that I'm not using at the moment. Is there a way to do this??
>
> To give a specific example of behavior I see, I've set up the N300 to
> boot with 1GigE connection on sfp0 and RJ45 on the eth0. From a host
> device, in network mode, I fail to probe the N300:
>
> ```
> $ uhd_usrp_probe --args "type=n3xx"
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_3.13.0.HEAD-0-g0ddc19e5
> [INFO] [MPMD] Initializing 1 device(s) in parallel with args:
> mgmt_addr=10.1.151.60,type=n3xx,product=n300,serial=3145FF4,claimed=False,addr=10.1.151.245
> [INFO] [MPM.PeriphManager] init() called with device args
> `mgmt_addr=10.1.151.60,product=n300'.
> [ERROR] [UHD] Exception caught in safe-call.
>   in ctrl_iface_impl<_endianness>::~ctrl_iface_impl() [with
> uhd::endianness_t _endianness = (uhd::endianness_t)0]
>   at
> /home/ejk/prefix/gnuradio-default/src/uhd/host/lib/rfnoc/ctrl_iface.cpp:60
> this->send_cmd_pkt(0, 0, true); -> EnvironmentError: IOError: Block ctrl
> (CE_00_Port_30) no response packet - AssertionError: bool(buff)
>   in uint64_t ctrl_iface_impl<_endianness>::wait_for_ack(bool, double)
> [with uhd::endianness_t _endianness = (uhd::endianness_t)0; uint64_t = long
> unsigned int]
>   at
> /home/ejk/prefix/gnuradio-default/src/uhd/host/lib/rfnoc/ctrl_iface.cpp:154
>
> [ERROR] [MPMD] Failure during block enumeration: EnvironmentError:
> IOError: recv error on socket: Connection refused
> Error: RuntimeError: Failed to run enumerate_rfnoc_blocks()
> ```
>
> If I then ssh onto the N300 and *disable* eth0, uhd_usrp_probe works
> successfully.. success!
>
> ```
> $ ssh root@10.1.151.245
> root@ni-n3xx-3145FF4:~# ip link set eth0 down
> root@ni-n3xx-3145FF4:~# exit
> logout
> Connection to 10.1.151.245 closed.
> $ uhd_usrp_probe --args "type=n3xx"
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_3.13.0.HEAD-0-g0ddc19e5
> [INFO] [MPMD] Initializing 1 device(s) in parallel with args:
> mgmt_addr=10.1.151.245,type=n3xx,product=n300,serial=3145FF4,claimed=False,addr=10.1.151.245
> [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID:
> 0xF1F0D00000000004)
> [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1324 MB/s)
> [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1327 MB/s)
> [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1320 MB/s)
> [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1322 MB/s)
> [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD100000011312)
> [INFO] [MPM.PeriphManager] init() called with device args
> `mgmt_addr=10.1.151.245,product=n300'.
> [WARNING] [RFNOC] Can't find a block controller for key FIFO, using
> default block controller!
> [INFO] [0/FIFO_0] Initializing block control (NOC ID: 0xF1F0000000000000)
> [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0000000000001)
> [WARNING] [DEVICE3] No block definition found, using default block
> configuration for block with NOC ID: 0xD0C0000000000001
> [etc...]
> ```
>
> I thought this may have to do something with the mgmt_addr parameter, but
> I'm not able to override the mgmt_addr parameter if I have both network
> interfaces enabled... it always reverts back to the mgmt_addr of the host
> PS, rather than the sfp0 port.
>
> I find the similar behavior (but reversed) when I'm running in embedded
> mode; I need to set `ip link set sfp0 down` in order to run uhd_usrp_probe
> in embedded mode.
>
> In general, I'm okay with disabling unused interfaces, but I could not
> find any mention of the desired "embedded mode vs host mode" behavior in
> the manual, so I wanted to confirm if this is intended operation.
>

It is not. UHD/MPM will always derive and fill out mgmt_addr, but you're
supposed to be able to override it `addr=...,mgmt_addr=...`.

>
> 4. Sometimes (every 10-20 runs or so), I'll get the N300 into one of two
> bad states. 1) MPM appears to have crashed or has gotten into some
> condition that stops a new host from connecting. In this situation, I'll
> typically try to restart the mpm service: `systemctl restart
> usrp-hwd.service` and it often recovers. Is there another way I should try
> to recover the device? 2) The other error case seems to manifest as an FPGA
> issue, like the ctrl packets cannot get through to the rfnoc blocks. A
> restart of mpm does not resolve this issue, and I've resorted to a N300
> reboot. Is there another way to reset the FPGA live without a reboot?
>

> (Sorry for neglecting logs on this item -- I dont have the output handy
> and it takes some time to recreate. I'll keep a record of the failures and
> send over if it's helpful...)
>

Some logs would be helpful. We sometimes get (rare) reports of MPM crashes,
but we are having a hard time reproducing them. To resolve stuck FPGA
issues, reloading the FPGA image might also help. However, these are also
not intended. Is this on custom images you've built?

Thanks for the detailed report,

-- M

>
>
> I think those are the major items for now. Thanks for the support and I
> appreciate any info you can share,
> EJ
> _______________________________________________
> USRP-users mailing list
> USRP-users@lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Re: [USRP-users] N3xx operational questions

Reply via email to