Hi Ping, 

The current implementation only locks the half-open pool if the pool is about 
to expand. This is done to increase speed by avoiding unnecessary locking, 
i.e., if pool is not about to expand, it should be safe to get a new element 
from it without affecting readers. Now the thing to figure out is why this is 
happening. Does the slowdown due to the “big lock” avoid some race or is there 
more to it?

First of all, how many workers do you have configured and how many sessions are 
you allocating/connecting? Do you see failed connects? 

Your tls_ctx_half_open_get line numbers don’t match my code. Did you by chance 
modify something else?

Thanks,
Florin

> On Aug 27, 2018, at 9:22 AM, Yu, Ping <ping...@intel.com> wrote:
> 
> Hello, all
>  
> Recently I found that the TLS half open lock is not well implemented, and if 
> enabling multiple thread, there are chances to get the following core dump 
> info in debug mode.
>  
> (gdb) where
> #0  0x00007f7a0848e428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x00007f7a0849002a in __GI_abort () at abort.c:89
> #2  0x0000000000407f0b in os_panic () at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vpp/vnet/main.c:331
> #3  0x00007f7a08867bd0 in debugger () at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:84
> #4  0x00007f7a08868008 in _clib_error (how_to_die=2, function_name=0x0, 
> line_number=0,
>     fmt=0x7f7a0a0add78 "%s:%d (%s) assertion `%s' fails") at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:143
> #5  0x00007f7a09e10be0 in tls_ctx_half_open_get (ctx_index=48) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:126
> #6  0x00007f7a09e11889 in tls_session_connected_callback (tls_app_index=0, 
> ho_ctx_index=48, tls_session=0x7f79c9b6d1c0,
>     is_fail=0 '\000') at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:404
> #7  0x00007f7a09d5ea6e in session_stream_connect_notify (tc=0x7f79c9b655fc, 
> is_fail=0 '\000')
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/session/session.c:648
> #8  0x00007f7a099cb969 in tcp46_syn_sent_inline (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0, is_ip4=1)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2306
> #9  0x00007f7a099cbe00 in tcp4_syn_sent (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2387
> #10 0x00007f7a08fefa35 in dispatch_node (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, type=VLIB_NODE_TYPE_INTERNAL,
>     dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f79c8b2a9c0, 
> last_time_stamp=902372436923868)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:988
> #11 0x00007f7a08feffee in dispatch_pending_node (vm=0x7f79c8a25100, 
> pending_frame_index=7, last_time_stamp=902372436923868)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1138
> #12 0x00007f7a08ff1bed in vlib_main_or_worker_loop (vm=0x7f79c8a25100, 
> is_main=0)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1554
> #13 0x00007f7a08ff240c in vlib_worker_loop (vm=0x7f79c8a25100) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1634
> #14 0x00007f7a09035541 in vlib_worker_thread_fn (arg=0x7f79ca4a41c0) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:1760
> #15 0x00007f7a0888aa38 in clib_calljmp ()
>    from 
> /home/pyu4/git-home/vpp_clean/vpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so
> #16 0x00007f7761198d70 in ?? ()
> #17 0x00007f7a090300be in vlib_worker_thread_bootstrap_fn (arg=0x7f79ca4a41c0)
> at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:684
>  
> seemly current code design may have race condition and to make the same index 
> returned, and after one free one index, the other will cause core dump. 
> I did a simple fix to add a big lock as the following, and this kind of core 
> dump never happen again.
> Do you guys have a better solution for the lock?
>  
> 
> +      clib_rwlock_writer_lock (&tm->half_open_rwlock);
>    pool_get_aligned_will_expand (tm->half_open_ctx_pool, will_expand, 0);
>    if (PREDICT_FALSE (will_expand && vlib_num_workers ()))
>      {
> -      clib_rwlock_writer_lock (&tm->half_open_rwlock);
>        pool_get (tm->half_open_ctx_pool, ctx);
>        memset (ctx, 0, sizeof (*ctx));
>        ctx_index = ctx - tm->half_open_ctx_pool;
> -      clib_rwlock_writer_unlock (&tm->half_open_rwlock);
>      }
>    else
>      {
> @@ -104,6 +103,8 @@ tls_ctx_half_open_alloc (void)
>        memset (ctx, 0, sizeof (*ctx));
>        ctx_index = ctx - tm->half_open_ctx_pool;
>      }
> +      clib_rwlock_writer_unlock (&tm->half_open_rwlock);
>    return ctx_index;
> }
>  
>  
>  
>  
>  
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#10302): https://lists.fd.io/g/vpp-dev/message/10302 
> <https://lists.fd.io/g/vpp-dev/message/10302>
> Mute This Topic: https://lists.fd.io/mt/24975100/675152 
> <https://lists.fd.io/mt/24975100/675152>
> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io>
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub 
> <https://lists.fd.io/g/vpp-dev/unsub>  [fcoras.li...@gmail.com 
> <mailto:fcoras.li...@gmail.com>]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10304): https://lists.fd.io/g/vpp-dev/message/10304
Mute This Topic: https://lists.fd.io/mt/24975100/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to