[lttng-dev] [RFC PATCH 2/2] Fix: order termination of ust_thread_manage_notify

2017-08-24 Thread Jonathan Rajotte
The following scenario lead to a deadlock:
 - An application is being registered sessiond side 
(thread_dispatch_ust_registration).
 - ust_thread_manage_notify is already terminated via quit pipe, hence
   not listening on the notify socket.
 - The application is trying to register an event 
(lttng_event_create->ustcomm_register_event) via the notify socket.
 - The notify socket is never closed.

The application hang on the communication via the notify socket since
no one is responding, sessiond is waiting for a reply on the
ustctl_register_done call.

Forcing an order of termination on ust_thread_manage_notify and
thread_dispatch_ust_registration guarantees that the socket will be
closed and allow for regular execution. This guarantee is provided by
the fact that no further registration can happen in the absence of the
registration dispatch thread and all current applications will be present
when the ust_thread_manage_notify perform its cleanup.

This can be reproduced using the sessiond_teardown_active_session
scenario provided by [1].

[1] https://github.com/PSRCode/lttng-stress

Signed-off-by: Jonathan Rajotte 
---
 src/bin/lttng-sessiond/lttng-sessiond.h |  7 +++
 src/bin/lttng-sessiond/main.c   | 24 +++-
 src/bin/lttng-sessiond/ust-thread.c | 16 
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/bin/lttng-sessiond/lttng-sessiond.h 
b/src/bin/lttng-sessiond/lttng-sessiond.h
index 74552db6..a805704e 100644
--- a/src/bin/lttng-sessiond/lttng-sessiond.h
+++ b/src/bin/lttng-sessiond/lttng-sessiond.h
@@ -95,6 +95,13 @@ struct ust_reg_wait_node {
 extern int apps_cmd_notify_pipe[2];
 
 /*
+ * This pipe is used to inform the ust_thread_manage_notify thread that the
+ * thread_dispatch_ust_registration thread is terminated. Allowing the
+ * ust_thread_manage_notify to perform its termination.
+ */
+extern int thread_dispatch_ust_registration_teardown_complete_pipe[2];
+
+/*
  * Used to notify that a hash table needs to be destroyed by dedicated
  * thread. Required by design because we don't want to move destroy
  * paths outside of large RCU read-side lock paths, and destroy cannot
diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c
index 03f695ec..5298b1a1 100644
--- a/src/bin/lttng-sessiond/main.c
+++ b/src/bin/lttng-sessiond/main.c
@@ -206,6 +206,13 @@ static int kernel_poll_pipe[2] = { -1, -1 };
 static int thread_quit_pipe[2] = { -1, -1 };
 
 /*
+ * This pipe is used to inform the ust_thread_manage_notify thread that the
+ * thread_dispatch_ust_registration thread is terminated. Allowing the
+ * ust_thread_manage_notify to perform its termination.
+ */
+int thread_dispatch_ust_registration_teardown_complete_pipe[2] = { -1, -1 };
+
+/*
  * This pipe is used to inform the thread managing application communication
  * that a command is queued and ready to be processed.
  */
@@ -477,6 +484,11 @@ static int init_thread_quit_pipe(void)
return __init_thread_quit_pipe(thread_quit_pipe);
 }
 
+static int init_thread_dispatch_ust_registration_teardown_complete_pipe(void)
+{
+   return 
__init_thread_quit_pipe(thread_dispatch_ust_registration_teardown_complete_pipe);
+}
+
 /*
  * Stop all threads by closing the thread quit pipe.
  */
@@ -623,6 +635,7 @@ static void sessiond_cleanup(void)
 * since we are now called.
 */
utils_close_pipe(thread_quit_pipe);
+   
utils_close_pipe(thread_dispatch_ust_registration_teardown_complete_pipe);
 
/*
 * If opt_pidfile is undefined, the default file will be wiped when
@@ -2128,6 +2141,11 @@ static void *thread_dispatch_ust_registration(void *data)
err = 0;
 
 error:
+   ret = 
notify_thread_pipe(thread_dispatch_ust_registration_teardown_complete_pipe[1]);
+   if (ret < 0) {
+   ERR("write error on thread 
thread_dispatch_ust_registration_teardown_complete_pipe");
+   }
+
/* Clean up wait queue. */
cds_list_for_each_entry_safe(wait_node, tmp_wait_node,
_queue.head, head) {
@@ -5711,12 +5729,16 @@ int main(int argc, char **argv)
goto exit_ht_cleanup;
}
 
-   /* Create thread quit pipe */
if (init_thread_quit_pipe()) {
retval = -1;
goto exit_init_data;
}
 
+   if (init_thread_dispatch_ust_registration_teardown_complete_pipe()) {
+   retval = -1;
+   goto exit_init_data;
+   }
+
/* Check if daemon is UID = 0 */
is_root = !getuid();
 
diff --git a/src/bin/lttng-sessiond/ust-thread.c 
b/src/bin/lttng-sessiond/ust-thread.c
index e605703a..c8d5ffdf 100644
--- a/src/bin/lttng-sessiond/ust-thread.c
+++ b/src/bin/lttng-sessiond/ust-thread.c
@@ -27,6 +27,11 @@
 #include "health-sessiond.h"
 #include "testpoint.h"
 
+static
+int thread_should_quit(int pollfd, uint32_t revents)
+{
+   return (pollfd == 

[lttng-dev] [RFC PATCH 1/2] Fix: unregister app notify socket on sessiond tear down

2017-08-24 Thread Jonathan Rajotte
A race between the sessiond tear down and applications initialization
can lead to a deadlock.

Applications try to communicate via the notify sockets while sessiond
does not listen anymore on these sockets since the thread responsible
for reception/response is terminated (ust_thread_manage_notify). These
sockets are never closed hence an application could hang on
communication.

Sessiond hang happen during call to cmd_destroy_session during
sessiond_cleanup. Sessiond is trying to communicate with the app while
the app is waiting for a response on the app notification socket.

To prevent this situation a call to ust_app_notify_sock_unregister is
performed on all entry of the ust_app_ht_by_notify_sock hash table at
the time of termination. This ensure that any pending communication
initiated by the application will be terminated since all sockets will
be closed at the end of the grace period via call_rcu inside
ust_app_notify_sock_unregister. The use of ust_app_ht_by_notify_sock
instead of the ust_app_ht prevent a double call_rcu since entries are
removed from ust_app_ht_by_notify_sock during ust_app_notify_sock_unregister.

RFC: ust_app_notify_sock_unregister might never call
call_rcu(close_notify_sock_rcu) when no memory is available. Should we
just issue a direct call to close() on app->notify_scoket  when
ust_thread_manage_notify quit?

This can be reproduced using the sessiond_teardown_active_session
scenario provided by [1].

[1] https://github.com/PSRCode/lttng-stress

Signed-off-by: Jonathan Rajotte 
---
 src/bin/lttng-sessiond/ust-thread.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/bin/lttng-sessiond/ust-thread.c 
b/src/bin/lttng-sessiond/ust-thread.c
index 7fb18a78..e605703a 100644
--- a/src/bin/lttng-sessiond/ust-thread.c
+++ b/src/bin/lttng-sessiond/ust-thread.c
@@ -27,6 +27,19 @@
 #include "health-sessiond.h"
 #include "testpoint.h"
 
+
+static
+void notify_sock_unregister_all()
+{
+   struct lttng_ht_iter iter;
+   struct ust_app *app;
+   rcu_read_lock();
+   cds_lfht_for_each_entry(ust_app_ht_by_notify_sock->ht, , app, 
notify_sock_n.node) {
+   ust_app_notify_sock_unregister(app->notify_sock);
+   }
+   rcu_read_unlock();
+}
+
 /*
  * This thread manage application notify communication.
  */
@@ -179,6 +192,7 @@ restart:
 
 exit:
 error:
+   notify_sock_unregister_all();
lttng_poll_clean();
 error_poll_create:
 error_testpoint:
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH lttng-tools v2] Fix: wrong use of the relay_streams_sent in snapshot

2017-08-24 Thread Julien Desfossez
> If you remove this, I think the streams will never get published
> within the relayd. (publish_connection_local_streams()). Is this
> an expected side-effect ? It should be documented in the changelog.
> My guess is that we indeed don't want to publish the snapshot
> streams to the viewers.
Indeed, a snapshot session cannot be a live session, so we don't
want/need to publish those streams. Also, sending this message every
time we send a stream is a wrong usage of the command.

> The reason for doing this change should also be documented. What
> behavior is unwanted here from a relayd perspective ?
We are sending this message to the relay (and waiting for the
confirmation) before taking the snapshot of each stream. So, in addition
to being wrong and useless, it adds a considerable delay before taking
the snapshot of each stream.

Do you agree ?

Thanks,

Julien
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH lttng-tools v2] Fix: wrong use of the relay_streams_sent in snapshot

2017-08-24 Thread Mathieu Desnoyers
If you remove this, I think the streams will never get published
within the relayd. (publish_connection_local_streams()). Is this
an expected side-effect ? It should be documented in the changelog.
My guess is that we indeed don't want to publish the snapshot
streams to the viewers.

The reason for doing this change should also be documented. What
behavior is unwanted here from a relayd perspective ?

Thanks,

Mathieu

- On Aug 23, 2017, at 1:48 PM, Julien Desfossez jdesfos...@efficios.com 
wrote:

> The relay_streams_sent message is only useful in live sessions and
> should only be sent after all the streams of a channel have been sent.
> 
> Here we were sending this message every time we sent a stream to the
> relay during a snapshot which makes no sense.
> 
> Signed-off-by: Julien Desfossez 
> ---
> src/common/kernel-consumer/kernel-consumer.c | 8 
> src/common/ust-consumer/ust-consumer.c   | 6 --
> 2 files changed, 14 deletions(-)
> 
> diff --git a/src/common/kernel-consumer/kernel-consumer.c
> b/src/common/kernel-consumer/kernel-consumer.c
> index a5dcc66..1c2751b 100644
> --- a/src/common/kernel-consumer/kernel-consumer.c
> +++ b/src/common/kernel-consumer/kernel-consumer.c
> @@ -187,14 +187,6 @@ int lttng_kconsumer_snapshot_channel(uint64_t key, char
> *path,
>   DBG("Kernel consumer snapshot stream %s/%s (%" PRIu64 
> ")",
>   path, stream->name, stream->key);
>   }
> - if (relayd_id != -1ULL) {
> - ret = consumer_send_relayd_streams_sent(relayd_id);
> - if (ret < 0) {
> - ERR("sending streams sent to relayd");
> - goto end_unlock;
> - }
> - channel->streams_sent_to_relayd = true;
> - }
> 
>   ret = kernctl_buffer_flush_empty(stream->wait_fd);
>   if (ret < 0) {
> diff --git a/src/common/ust-consumer/ust-consumer.c
> b/src/common/ust-consumer/ust-consumer.c
> index 366f855..bce7db8 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -1101,12 +1101,6 @@ static int snapshot_channel(uint64_t key, char *path,
> uint64_t relayd_id,
>   DBG("UST consumer snapshot stream %s/%s (%" PRIu64 ")", 
> path,
>   stream->name, stream->key);
>   }
> - if (relayd_id != -1ULL) {
> - ret = consumer_send_relayd_streams_sent(relayd_id);
> - if (ret < 0) {
> - goto error_unlock;
> - }
> - }
> 
>   /*
>* If tracing is active, we want to perform a "full" buffer 
> flush.
> --
> 2.7.4
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev