[USRP-users] Core dump after disconnecting radio

2018-10-25 Thread Brophy, William via USRP-users
If a USRP N210 (and possibly other) radios are disconnected while in use (i.e. 
while streaming IQ data), the program will end with an exception thrown from 
within UHD. This can be VERY easily reproduced using the example UHD program 
rx_stream_to_file with no changes. Simply run:

/usr/lib64/uhd/examples/rx_samples_to_file --args addr=192.168.10.2 -duration 20

And wait for the streaming to start. Then disconnect the ethernet cored from 
the radio. The program will recognize the error and begin to exit gracefully, 
and then abort with the following backtrace:

#0  0x7f8d46c1e1f7 in raise () from /lib64/libc.so.6
#1  0x7f8d46c1f8e8 in abort () from /lib64/libc.so.6
#2  0x7f8d47524ac5 in __gnu_cxx::__verbose_terminate_handler() () from 
/lib64/libstdc++.so.6
#3  0x7f8d47522a36 in ?? () from /lib64/libstdc++.so.6
#4  0x7f8d475219e9 in ?? () from /lib64/libstdc++.so.6
#5  0x7f8d47522654 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
#6  0x7f8d46fbb903 in ?? () from /lib64/libgcc_s.so.1
#7  0x7f8d46fbbc9b in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
#8  0x7f8d47522c76 in __cxa_throw () from /lib64/libstdc++.so.6
#9  0x7f8d4879f3de in udp_zero_copy_asio_msb::release (this=)
at /usr/src/debug/uhd-3.12.0.0/host/lib/transport/udp_zero_copy.cpp:125
#10 0x7f8d485e142e in intrusive_ptr_release (p=)
at /usr/src/debug/uhd-3.12.0.0/host/include/uhd/transport/zero_copy.hpp:104
#11 ~intrusive_ptr (this=0x7ffe4921da80, __in_chrg=)
at 
/usr/local/upe/buildtools/include/boost-1_67/boost/smart_ptr/intrusive_ptr.hpp:98
#12 send_pkt (cmd=256, data=, addr=, 
this=0x856890)
at /usr/src/debug/uhd-3.12.0.0/host/lib/usrp/usrp2/usrp2_fifo_ctrl.cpp:165
#13 usrp2_fifo_ctrl_impl::poke32 (this=0x856890, addr=, 
data=)
at /usr/src/debug/uhd-3.12.0.0/host/lib/usrp/usrp2/usrp2_fifo_ctrl.cpp:63
#14 0x7f8d4830e245 in rx_dsp_core_200_impl::issue_stream_command 
(this=0x859200, stream_cmd=...)
at /usr/src/debug/uhd-3.12.0.0/host/lib/usrp/cores/rx_dsp_core_200.cpp:130
#15 0x7f8d48505f0b in operator() (a0=..., this=)
at 
/usr/local/upe/buildtools/include/boost-1_67/boost/function/function_template.hpp:768
#16 issue_stream_cmd (stream_cmd=..., this=0x873718)
at 
/usr/src/debug/uhd-3.12.0.0/host/lib/transport/super_recv_packet_handler.hpp:223
#17 uhd::transport::sph::recv_packet_streamer::issue_stream_cmd (this=0x873710, 
stream_cmd=...)
at 
/usr/src/debug/uhd-3.12.0.0/host/lib/transport/super_recv_packet_handler.hpp:842
#18 0x0042e246 in recv_to_file > (usrp=..., 
cpu_format="sc16",
wire_format="sc16", channel="0", file="usrp_samples.dat", 
samps_per_buff=samps_per_buff@entry=1,
num_requested_samples=0, time_requested=time_requested@entry=20, 
bw_summary=false, stats=false,
null=false, enable_size_map=false, continue_on_bad_packet=false)
at /usr/src/debug/uhd-3.12.0.0/host/examples/rx_samples_to_file.cpp:152
#19 0x0041cbe2 in _main (argc=, argv=)
at /usr/src/debug/uhd-3.12.0.0/host/examples/rx_samples_to_file.cpp:387
#20 0x0041a1bb in main (argc=, argv=)
at /usr/src/debug/uhd-3.12.0.0/host/examples/rx_samples_to_file.cpp:227


This wouldn't be such a huge problem except that the exception is thrown from a 
destructor (~intrusive_ptr()), which means it cannot be caught in a C++ 
program. Has anyone found a solution or workaround to this problem?

We are currently using UHD 3.12.0.0.

Thanks
Will
___
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


Re: [USRP-users] x300 unrecoverable timeouts

2018-07-17 Thread Brophy, William via USRP-users
Hi Dario,

Are you saying you patched UHD to wait for AKS? Would you be able to provide a 
patch for this?

Thanks
Will

From: Dario Pennisi 
Sent: Friday, July 13, 2018 1:49 AM
To: Brophy, William ; Keith k 
Cc: usrp-users@lists.ettus.com
Subject: Re: [USRP-users] x300 unrecoverable timeouts

Hi,
We recently investigated a similar issue and have a clear understanding on what 
this comes from.
Commands sent by PC to usrp device are responded with an acknowledge. Each 
command has a sequence number and is sent asynchronously. On the receiving side 
there is a check on acknowledge sequence number and if one is lost system will 
basically give up. The reason why packets can get lost is simply that 
communication is using udp which gives no guarantee on packet delivery and 
linux may drop packets, incoming or outgoing, at any time, even worse if you 
are passing through a switch instead of having a 1:1 link.
We fixed this by patching code so that when sending commands we immediately 
wait for acknowledge and if it doesn't get back in time we retry. This of 
course does not allow pipelined command transfers but provides a reliable 
solution as trying to cache commands and resend them if ack is out of sequence 
won't work since resending commands at that point would change the order 
commands are executed and could be potentially very wrong.
Would be great if someone from usrp could discuss this a bit further and come 
out with a better solution...
Best regards,
Dario Pennisi


On Fri, Jul 13, 2018 at 2:29 AM +0200, "Keith k via USRP-users" 
mailto:usrp-users@lists.ettus.com>> wrote:
Hello Will
This sounds eerily similar to issues I've had using N200s. I basically found 
that working at high rates, using either STREAM_MODE_NUM_SAMPS_AND_DONE or 
using starts and stops was completely unusable. The system would go into an 
unrecoverable set of timeouts or overflows. I had to switch to using non 
interrupted continuous streaming and I had to make sure that the UHD threads 
were isolated to their own cpu cores in order to eliminate being preempted. 
This was the only way I could get stable runtime of the rx side during a long 
running application.

On Thu, Jul 12, 2018 at 1:22 PM, Brophy, William via USRP-users 
mailto:usrp-users@lists.ettus.com>> wrote:
While working to get coherent streams working, I ran into an issue using an 
x310 with two TwinRX daughterboards.
The issue starts with a series of "ERROR_CODE_OVERFLOW (Out of sequence error)" 
errors. In an attempt to recover from that, the rx streamer is thrown out and 
recreated. The next stream errors change to "ERROR_CODE_TIMEOUT". Once in this 
state, all future streams end with this error.
The x310 is connected over 10G ethernet.

I managed to reproduce this error with an example program based off of 
“rx_multi_samples.cpp”. I had to make the following changes:

  1.  STREAM_MODE_START_CONTINUOUS is now used, ending the stream with 
STREAM_MODE_STOP_CONTINUOUS
  2.  The stream delay was set to .01 (mostly to speed up the rate the error 
would occur)
  3.  Multi stream commands (and stop commands) are issued in repetition 
(start, stop, start, stop, etc.) rather than just one long stream
  4.  Each stream uses a different sampling rate (alternates between 25Msps and 
50Msps)
  5.  A small loop was added to the collect loop to slow down the thread enough 
to get overflow errors (but only sometimes, nothing crazy)
  6.  Once the out of sequence error is encountered 10 times in a row, the rx 
streamer is destroyed and re-created
  7.  Every stream command after step 5 ends in a timeout error

It is also worth pointing out that this does not happen if the sample rate does 
not change. The out of sequence errors will still happen until the rx streamer 
is re-created, but the timeout errors do not occur after that…

I attached the entire example program (with modifications) to this email.
I started it with the args:
rx_multi_samples --args addr=192.168.30.2 --subdev "A:0 A:1 B:0 B:1" --channels 
"0,1,2,3" --dilv --rate 5000 --nsamps 800

Is there something wrong with how we are using the interface? Is there steps we 
can take to either avoid or recover from this state?

I appreciate any help we can get…

Will

___
USRP-users mailing list
USRP-users@lists.ettus.com<mailto:USRP-users@lists.ettus.com>
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com



--
-Keith Kotyk
___
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


[USRP-users] x300 unrecoverable timeouts

2018-07-12 Thread Brophy, William via USRP-users
While working to get coherent streams working, I ran into an issue using an 
x310 with two TwinRX daughterboards.
The issue starts with a series of "ERROR_CODE_OVERFLOW (Out of sequence error)" 
errors. In an attempt to recover from that, the rx streamer is thrown out and 
recreated. The next stream errors change to "ERROR_CODE_TIMEOUT". Once in this 
state, all future streams end with this error.
The x310 is connected over 10G ethernet.

I managed to reproduce this error with an example program based off of 
"rx_multi_samples.cpp". I had to make the following changes:

  1.  STREAM_MODE_START_CONTINUOUS is now used, ending the stream with 
STREAM_MODE_STOP_CONTINUOUS
  2.  The stream delay was set to .01 (mostly to speed up the rate the error 
would occur)
  3.  Multi stream commands (and stop commands) are issued in repetition 
(start, stop, start, stop, etc.) rather than just one long stream
  4.  Each stream uses a different sampling rate (alternates between 25Msps and 
50Msps)
  5.  A small loop was added to the collect loop to slow down the thread enough 
to get overflow errors (but only sometimes, nothing crazy)
  6.  Once the out of sequence error is encountered 10 times in a row, the rx 
streamer is destroyed and re-created
  7.  Every stream command after step 5 ends in a timeout error

It is also worth pointing out that this does not happen if the sample rate does 
not change. The out of sequence errors will still happen until the rx streamer 
is re-created, but the timeout errors do not occur after that...

I attached the entire example program (with modifications) to this email.
I started it with the args:
rx_multi_samples --args addr=192.168.30.2 --subdev "A:0 A:1 B:0 B:1" --channels 
"0,1,2,3" --dilv --rate 5000 --nsamps 800

Is there something wrong with how we are using the interface? Is there steps we 
can take to either avoid or recover from this state?

I appreciate any help we can get...

Will
//
// Copyright 2011 Ettus Research LLC
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program.  If not, see .
//

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

namespace po = boost::program_options;

const size_t MAX_OVERFLOW_ERRORS = 5; // the maximum number of overflows at 
zero samples received before the capture errors out


int UHD_SAFE_MAIN(int argc, char *argv[]){
uhd::set_thread_priority_safe();

//variables to be set by po
std::string args, sync, subdev, channel_list;
double seconds_in_future;
size_t total_num_samps;
double rate;

//setup the program options
po::options_description desc("Allowed options");
desc.add_options()
("help", "help message")
("args", po::value(&args)->default_value(""), 
"single uhd device address args")
("secs", 
po::value(&seconds_in_future)->default_value(1.5), "number of seconds 
in the future to receive")
("nsamps", 
po::value(&total_num_samps)->default_value(1), "total number of 
samples to receive")
("rate", po::value(&rate)->default_value(100e6/16), 
"rate of incoming samples")
("sync", po::value(&sync)->default_value("now"), 
"synchronization method: now, pps, mimo")
("subdev", po::value(&subdev), "subdev spec 
(homogeneous across motherboards)")
("dilv", "specify to disable inner-loop verbose")
("channels", 
po::value(&channel_list)->default_value("0"), "which channel(s) to 
use (specify \"0\", \"1\", \"0,1\", etc)")
;
po::variables_map vm;
po::store(po::parse_command_line(argc, argv, desc), vm);
po::notify(vm);

//print the help message
if (vm.count("help")){
std::cout << boost::format("UHD RX Multi Samples %s") % desc << 
std::endl;
std::cout <<
"This is a demonstration of how to receive 
aligned data from multiple channels.\n"
"This example can receive from multiple 
DSPs, multiple motherboards, or both.\n"
"The MIMO cable or PPS can be used to 
synchronize the configuration. See --sync\n"
"\n"
"Specify --subdev to s