Hi Nate and Marcus.

I didn't receive Markus' mail, but could read it on the archive.

I am having problems just receiving the 56e6 samples up from the device.
I am not writing them to disk (see the '--null' argument that avoids
write calls to disk), nor processing them in any way.

I am using UHD (maint branch). Images are fetched using the
supplied populate_images.py script.

I feel that 100 instructions per sample is way too high since the CPU's
has single instruction, multiple data (SIMD) instructions, which seems
to be in use (saw in CPU profiler).

I totally agree that writing these samples to disk is a though order,
which we are not interested in. The rx_samples_to_file program was used
as a benchmark program, but as Nate suggested, the benchmark_rate is
better suited for this.

Meltdown is not patched on the Linux host. Altered governor to
'performance'. No throttling.

Thank you for suggesting the sc12 wire format. It was known that the
precisions was only 12 bits, but should of course be used for benchmarking.

Following Marcus' suggestion of altering num_recv_frames only, we have
been able to receive 56 Msps reliably on the i5-3210M Linux host. Still
using ~70%.

We cannot reach 56Msps on Windows, not even close. We tried to run
  start /B /wait /realtime ./benchmark_rate.exe --rx_otw sc12 --rx_rate
40e6 --args="num_recv_frames=43"
but it fails with occasional overflows.

It seems only viable solution is to run on Linux, if 56Msps is required.


Den 13-02-2018 kl. 21:39 skrev Nate Temple:
> Hi Kasper,
> There are several caveats/issues/topics to consider with regards to
> running at higher sample rates with the B2xx. Generally speaking,
> Linux will offer better performance than Windows. 
> What version of UHD are you using? If you're not using UHD,
> can you please try upgrading? UHD includes a commit [1] and
> updated firmware, which optimizes the FX3 performance. 
> The i5-3210M may be slightly under-powered for the task, however you
> can try all the following performance tuning adjustments, and it may
> be able to support 56 MS/s.
> It is worth noting that the recent KPTI patches and other related
> workarounds [2] for Intel CPUs to protect against Meltdown/Spectre
> attacks [3], may cause a considerable overhead. Here [4] are
> instructions on how to check to see if KPTI is enabled for Ubuntu. You
> may want to disable KPTI, if it is enabled on your system, then test
> to see how much additional overhead it creates, running your application. 
> Adjust your CPU Governor to "performance". This can be done with the
> cpu-frequtils utility [5]. ( sudo cpufreq-set -g performance )
> Ensure your CPU is not throttling due to overheating ( sudo cat
> /var/log/syslog | grep throttled ). This is very common in laptops,
> especially older devices where the thermal grease is in need of
> replacement.
> You can test using sc8 and sc12 OTW (over the wire) [6] sample sizes
> with the benchmark_rate [7] utility. Using sc12 will not drop any
> information as the ADC/DAC on the B2xx is 12bits. 
> ./benchmark_rate --rx_otw sc12 --rx_rate 40e6
> ./benchmark_rate --tx_otw sc8 --tx_rate 40e6
> Some USB controllers can be problematic. Intel Series 7/8/9 USB
> controllers usually offer the best performance. 
> Using Thinkpads (T430s, T470p) I've found that a recv/send frame size
> of 8192 tends to work the best at higher sample rates.
> As Marcus mentioned, the UHD examples are provided as an API reference
> and not tuned for performance. Case in point is rx_samples_to_file
> being single threaded. GNU Radio will by default offer a
> multi-threaded architecture, which can be useful to test. You may need
> to adjust the min buffer sizes to handle the higher sample rates
> however within the GR Blocks.
> I've attached an example of rx_samples_to_file.cpp which is
> multi-thread and has additional buffering.
> Without a SSD or NVMe hard drive, sustaining a high sample rate to
> disk can be difficult. Depending upon your system configuration, you
> may want to consider using a ram disk. I would recommend leaving at
> least 2-8 GB of ram for your host OS (this is dependent upon your
> application etc). This will however limit the length of time you can
> save to disk (as limited by the ram in the machine). Below is an
> example to create a 24GB ramdisk:
> mkdir -p ~/ramfs
> mount -t tmpfs -o size=24G tmpfs ~/ramfs
> [1] -
> https://github.com/EttusResearch/uhd/commit/d95613152da3e7c7f41c71acca65101ed0896893
> [2] - https://en.wikipedia.org/wiki/Kernel_page-table_isolation
> [3] - https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
> <https://en.wikipedia.org/wiki/Meltdown_%28security_vulnerability%29>
> [4] -
> https://askubuntu.com/questions/992137/how-to-check-that-kpti-is-enabled-on-my-ubuntu
> [5] - http://www.thinkwiki.org/wiki/How_to_use_cpufrequtils
> [6] - https://files.ettus.com/manual/page_stream.html#stream_datatypes_otw
> [7] -
> https://github.com/EttusResearch/uhd/blob/maint/host/examples/benchmark_rate.cpp
> Regards,
> Nate Temple
> On Tue, Feb 13, 2018 at 11:59 AM, Marcus D. Leech via USRP-users
> <usrp-users@lists.ettus.com <mailto:usrp-users@lists.ettus.com>> wrote:
>     On 02/13/2018 04:52 AM, Kasper Føns via USRP-users wrote:
>         Hi.
>         We have bought a B200 board and are having issues simply
>         receiving the
>         samples and would like some support in the matter.
>         Running the command
>            ./rx_samples_to_file --null --rate 56000000
>         on a Sony Vaio Z with an I5-3210M running Ubuntu Server 17.10
>         shows a
>         high CPU usage of ~78%.
>         Is such a high CPU usage expected?
>         Switching terminal windows (ALT + F1 or ALT + F2) is enough to
>         cause an
>         overflow on the Linux host.
>         There is also a high CPU usage on a Windows 10 machine (ThinkPad,
>         i7-4810MQ).
>         Running
>            ./rx_samples_to_file --null --rate 56000000
>         results in a infinite stream of overflows.
>         Running
>            ./rx_samples_to_file --null --rate 32000000
>         utilizes 22% CPU and still overflows once in a while. Moving a
>         calculator window around the screen results in overflows.
>         We have tried increasing buffers using --args="recv_frame_size=X,
>         num_recv_frames=Y"
>         However, we haven't been able to increase X to higher values
>         than ~16000
>         (16384 fails with lots of overflows).
>         The same applies to Y, where 300 fails with an error.
>         The software was compiled in release mode an ran over a USB 3
>         connection.
>         Thus, for USB transfers using the B200:
>           - On Vaio: Is ~78% CPU usage expected for 56 Msps ?
>           - On Win10: Is it not possible to receive 56 Msps?
>           - On Win10: Is 22% CPU usage expected for 32 Msps?
>           - Is there some limit to recv_frame_size? A value of 16384
>         fails with
>         infinite overflows.
>           - Is there some way of tuning the framework for lower CPU?
>         W_______________________________________________
>         USRP-users mailing list
>         USRP-users@lists.ettus.com <mailto:USRP-users@lists.ettus.com>
>         http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>         <http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com>
>         e hope you are able to help!
>     Let's do some first-order math.
>     You're bringing in ~5e7 samples/second
>     If we optimistically assume a mean instructions-per-sample
>     (including both kernel and user-space code) of 100
>     instructions/sample, then we're talking a
>       requirement for 5e9 instructions/second.  If your CPU is running
>     at 3e9 Hz, then it'll need to, on average, issue 1.6
>     instructions/cycle.
>     You'll generally get more "mileage" out of num_recv_frames than
>     the frame size.  On any given system, my understanding is that
>     this is a shared resource
>       (across a given USB controller, I think, but don't quote me).
>     Now, the rx_samples_to_file application is single-threaded, so
>     it's trying to service the data-stream from the USRP at the same
>     time as it's making filesystem
>       calls to write the data (even if to /dev/null). That's a tall
>     order for a single-threaded application running at 5.6e7sps. 
>      These applications, provided with
>       UHD, are generally intended as *coding examples*, and no
>     guarantees exist with respect to performance on any given system.
>     Furthermore, some
>       USB3 controllers are better at handling bulk high-data-rate
>     applications than others, and the controller landscape changes so
>     quickly that it's next to
>       impossible to provide up-to-date recommendations in that department.
>     If you install Gnu Radio, there's an application called
>     "uhd_rx_cfile" that takes advantage of the multi-threaded nature
>     of Gnu Radio, and does better.
>     But keep *firmly in mind* that once you migrate from writing to
>     /dev/null to writing to real disk hardware, 5e7 samples/second is
>     going to result in
>       a LOT of disk I/O--more than most ordinary single-disk, non-RAID
>     disk systems can usefully sustain.
>     _______________________________________________
>     USRP-users mailing list
>     USRP-users@lists.ettus.com <mailto:USRP-users@lists.ettus.com>
>     http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>     <http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com>

USRP-users mailing list

Reply via email to