Dear Andreas,

Single thread vs. multiple workers? Debug image? vm->heap_aligned_base matches 
reality?

(Virtual address of allocated frame - vm->heap_aligned_base) / 
CLIB_CACHE_LINE_BYTES fits in 32 bits?

In vlib/main.c:vlib_frame_alloc_to_node(...) try replacing 
vlib_frame_index_no_check(vm, f) with vlib_frame_index(vm, f) in a debug image.

Again, best I can do to help w/ next-to-no information.

D.

From: [email protected] <[email protected]> On Behalf Of Andreas Schultz
Sent: Wednesday, July 3, 2019 4:47 AM
To: Dave Barach (dbarach) <[email protected]>
Cc: Hugo Garza <[email protected]>; [email protected]
Subject: Re: [vpp-dev] SIGSEGV after calling vlib_get_frame_to_node

Hi,

I've run into the same issue with different, but also external code.

The calling sequence in my case looks very similar to the one from Hugo. I'm 
also getting a invalid point from vlib_get_frame_to_node.
It is crashing here: 
https://github.com/travelping/vpp/blob/feature/master/upf%2Btdf/src/plugins/upf/upf_pfcp_server.c#L121

@Hugo: have you found the root cause for your problem?

Regards
Andreas

Am Mi., 28. Nov. 2018 um 12:53 Uhr schrieb Dave Barach via 
Lists.Fd.Io<http://Lists.Fd.Io> 
<[email protected]<mailto:[email protected]>>:
None of the routine names in the backtrace exist in master/latest – it’s your 
code - so it will be challenging for the community to help you.

See if you can repro the problem with a TAG=vpp_debug images (aka “make build” 
not “make build-release”). If you’re lucky, one of the numerous ASSERTs will 
catch the problem early.

vlib_get_frame_to_node(...) is not new code, it’s used all over the place, and 
it needs “help” to fail as shown below.

D.

From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> On Behalf Of Hugo Garza
Sent: Tuesday, November 27, 2018 7:39 PM
To: [email protected]<mailto:[email protected]>
Subject: [vpp-dev] SIGSEGV after calling vlib_get_frame_to_node

Hi vpp-dev,

I'm seeing a crash when I enable our application with multiple works.
Nov 26 14:29:32  vnet[64035]: received signal SIGSEGV, PC 0x7f6979a12ce8, 
faulting address 0x7fa6cd0bd444
Nov 26 14:29:32  vnet[64035]: #0  0x00007f6a812743d8 0x7f6a812743d8
Nov 26 14:29:32  vnet[64035]: #1  0x00007f6a80bc56d0 0x7f6a80bc56d0
Nov 26 14:29:32  vnet[64035]: #2  0x00007f6979a12ce8 vlib_frame_vector_args + 
0x10
Nov 26 14:29:32  vnet[64035]: #3  0x00007f6979a16a2c tcpo_enqueue_to_output_i + 
0xf4
Nov 26 14:29:32  vnet[64035]: #4  0x00007f6979a16b23 tcpo_enqueue_to_output + 
0x25
Nov 26 14:29:32  vnet[64035]: #5  0x00007f6979a33fba send_packets + 0x7f2
Nov 26 14:29:32  vnet[64035]: #6  0x00007f6979a346f8 connection_tx + 0x17e
Nov 26 14:29:32  vnet[64035]: #7  0x00007f6979a34f08 tcpo_dispatch_node_fn + 
0x7fa
Nov 26 14:29:32  vnet[64035]: #8  0x00007f6a81248cb6 vlib_worker_loop + 0x6a6
Nov 26 14:29:32  vnet[64035]: #9  0x00007f6a8094f694 0x7f6a8094f694

Running on CentOS 7.4  with kernel 3.10.0-693.el7.x86_64
VPP
Version:                  v18.10-13~g00adcce~b60
Compiled by:              root
Compile host:             b0f32e97e93a
Compile date:             Mon Nov 26 09:09:42 UTC 2018
Compile location:         /w/workspace/vpp-merge-1810-centos7
Compiler:                 GCC 7.3.1 20180303 (Red Hat 7.3.1-5)
Current PID:              9612

On a Cisco server with 2 socket Intel Xeon E5-2697Av4 @ 2.60GHz and 2 Intel 
X520 NICs. T-Rex traffic generator is hooked up on the other end to provided 
data at about 5Gbps per NIC.
./t-rex-64 --astf -f astf/nginx_wget.py -c 14 -m 40000 -d 3000

startup.conf
unix {
  nodaemon
  interactive
  log /opt/tcpo/logs/vpp.log
  full-coredump
  cli-no-banner
  #startup-config /opt/tcpo/conf/local.conf
  cli-listen /run/vpp/cli.sock
}
api-trace {
  on
}
heapsize 3G
cpu {
  main-core 1
  corelist-workers 2-5
}
tcpo {
runtime-config /opt/tcpo/conf/runtime.conf
session-pool-size 1024000
}
dpdk {
  dev 0000:86:00.0 {
    num-rx-queues 1
  }
  dev 0000:86:00.1 {
    num-rx-queues 1
  }
  dev 0000:84:00.0 {
    num-rx-queues 1
  }
  dev 0000:84:00.1 {
    num-rx-queues 1
  }
  num-mbufs 1024000
  socket-mem 4096,4096
}
plugin_path /usr/lib/vpp_plugins
api-segment {
  gid vpp
}

Here's the function where the SIGSEGV is happening:

static void enqueue_to_output_i(tcpo_worker_ctx_t * wrk, u32 bi, u8 flush) {

    u32 *to_next, next_index;

    vlib_frame_t *f;


    TRACE_FUNC_VAR(bi);


    next_index = tcpo_output_node.index;


    /* Get frame to output node */

    f = wrk->tx_frame;

    if (!f) {

        f = vlib_get_frame_to_node(wrk->vm, next_index);

        ASSERT (clib_mem_is_heap_object (f));

        wrk->tx_frame = f;

    }

    ASSERT (clib_mem_is_heap_object (f));


    to_next = vlib_frame_vector_args(f);

    to_next[f->n_vectors] = bi;

    f->n_vectors += 1;


    if (flush || f->n_vectors == VLIB_FRAME_SIZE) {

        TRACE_FUNC_VAR2(flush, f->n_vectors);

        vlib_put_frame_to_node(wrk->vm, next_index, f);

        wrk->tx_frame = 0;

    }

}


I've observed that after a few Gbps of traffic go through and we call 
vlib_get_frame_to_node the pointer f that gets returned points to a chunk of 
memory that is invalid as confirmed by the assert statement that I added 
afterwards right below.

Not sure how to progress further on tracking down this issue, any help or 
advice would be much appreciated.

Thanks,
Hugo
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11444): https://lists.fd.io/g/vpp-dev/message/11444
Mute This Topic: https://lists.fd.io/mt/28408842/675601
Group Owner: [email protected]<mailto:vpp-dev%[email protected]>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[[email protected]<mailto:[email protected]>]
-=-=-=-=-=-=-=-=-=-=-=-


--

Andreas Schultz

--

Principal Engineer

t: +49 391 819099-224

------------------------------- enabling your networks 
-----------------------------

Travelping GmbH

Roentgenstraße 13

39108 Magdeburg

Germany

t: +49 391 819099-0

f: +49 391 819099-299

e: [email protected]<mailto:[email protected]>

w: https://www.travelping.com/


Company registration: Amtsgericht Stendal
Reg. No.: HRB 10578
Geschaeftsfuehrer: Holger Winkelmann
VAT ID: DE236673780


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#13432): https://lists.fd.io/g/vpp-dev/message/13432
Mute This Topic: https://lists.fd.io/mt/28408842/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to