@nadav, asking for advice/guidance.

One thing Open MPI does is sched_{get,set}affinity over all threads of current process. This is difficult in OSv as:
 - there is only one process, and "all threads" is to wide then
- I even can't get list of all threads belonging to Open MPI (or, after noticing application::unsafe_stop_and_aband() https://github.com/cloudius-systems/osv/blob/master/core/app.cc#L106, I can, but it is a bit difficult) - sched_getaffinity accepts only pid==0. I will send little patch to accept also pid==current_tid. But later, it should also accept other tids - will be that a problem?

To me, it seems that OSv equivalent of linux process is osv::application (ELF namespace can contain more than one app; and app_runtime has 1:1 relationship with application I think). I would like to add (did) add list to track threads belonging to each app. For thread, there are sched::thread, pthread and pthread_t (and pthread pthread_t have 1:1: relationship too - to_libc(), from_libc()). osv::application already stores its primary thread in _thread, type pthread_t. So I choose to save list of pthread_t. But there are also sched::thread without corresponding pthread_t. Not a problem for me, but worth mentioning.

Maybe list of pthread_t, stored in each app, is not needed and application::unsafe_stop_and_aband() approach should be used.

Open MPI gets list of process threads from /proc/PID/task/TID. Should be that implemented? Or should be added an additional C function to just store TIDs in an array?

I attached my current patch just to illustrate what I added so far.
Only later I realized that sched::thread has ptr to app_runtime, so there is no need to store list of all applications. I used app_list only to figure out application owning the currently destroyed thread.

BR Justin

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.
>From 9115916ab8d4b4f44370da623976f4768f3f650d Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:10:11 +0200
Subject: [PATCH 3/3] app_list, app-my_threads list, TID alloc/free debug
 printf

Signed-off-by: Justin Cinkelj <[email protected]>
---
 core/app.cc   | 14 ++++++++++++++
 core/sched.cc |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/core/app.cc b/core/app.cc
index 0806f44..57dc52d 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -151,6 +151,7 @@ application::application(const std::string& command,
     , _joiner(nullptr)
     , _terminated(false)
 {
+    fprintf(stderr, "APP CTOR: app=%p cmd=%s new_program=%d\n", this, command.c_str(), new_program);
     try {
         elf::program *current_program;
 
@@ -183,6 +184,7 @@ application::application(const std::string& command,
     }
     WITH_LOCK(app_list_lock) {
         app_list.push_back(this);
+        fprintf(stderr, "APP app_list add app %p %s, new app_list.size=%d\n", this, this->_command.c_str(), app_list.size());
     }
 }
 
@@ -209,6 +211,7 @@ application::~application()
     assert(_runtime.use_count() == 1 || !_runtime);
     WITH_LOCK(app_list_lock) {
         auto it = std::find(app_list.begin(), app_list.end(), this);
+        fprintf(stderr, "APP app_list removing app %p %s, old app_list.size=%d\n", this, this->_command.c_str(), app_list.size());
         assert(it != app_list.end());
         std::swap(*it, app_list.back());
         app_list.pop_back();
@@ -486,6 +489,11 @@ void application::add_thread(pthread_t thread)
         if (app) {
             app->_my_threads.push_back(thread);
         }
+        if (1) {
+            sched::thread &t = pthread_to_thread(thread);
+            const char* app_cmd = app? app->_command.c_str(): NULL;
+            fprintf(stderr, "APP add_thread thread=%d, TID=%d, app=%p %s, app_over=%p app_cur=%p app_list.size = %d\n", thread, t.id(), app, app_cmd, override_current_app, get_current().get(), app_list.size());
+        }
     }
 }
 
@@ -493,16 +501,22 @@ void application::remove_thread(pthread_t thread)
 {
     WITH_LOCK(app_list_lock) {
         std::vector<application*>::iterator it;
+        fprintf(stderr, "APP rem_thread thread=%d app_list.size=%d\n", thread, app_list.size());
         for(it = app_list.begin(); it < app_list.end(); it++) {
             application* app = *it;
             assert(app);
+            //const char* app_cmd = app->_command.c_str();
+            //fprintf(stderr, "APP     rem_thread app==%p %s\n", app, app_cmd);
             auto element = std::find(app->_my_threads.begin(), app->_my_threads.end(), thread);
             if (element != app->_my_threads.end()) {
+                //sched::thread &t = pthread_to_thread(thread);
+                //fprintf(stderr, "APP     rem_thread thread=%d, TID=%d, app=%p %s, app_over=%p app_cur=%p\n", thread, t.id(), app, app_cmd, override_current_app, get_current().get());
                 app->_my_threads.erase(element);
                 return;
             }
         }
     }
+    fprintf(stderr, "APP     rem_thread thread=%d, TID=NOT-FOUND at_all app_list.size=%d\n", thread, app_list.size());
 }
 
 namespace this_application {
diff --git a/core/sched.cc b/core/sched.cc
index 23d5d92..0f40326 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -801,6 +801,7 @@ thread::thread(std::function<void ()> func, attr attr, bool main, bool app)
                     tid = 1;
                 }
                 if (!find_by_id(tid)) {
+                    fprintf(stderr, "THR alloc TID %d\n", tid);
                     _s_idgen = _id = tid;
                     thread_map.insert(std::make_pair(_id, this));
                     break;
@@ -887,6 +888,7 @@ thread::~thread()
     }
     WITH_LOCK(thread_map_mutex) {
         thread_map.erase(_id);
+        fprintf(stderr, "THR freed TID %d\n", _id);
         total_app_time_exited += _total_cpu_time;
     }
     if (_attr._stack.deleter) {
-- 
2.5.0

>From b31b77c84caa0c9440d0e9bb91f16871beb93b5e Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:30:35 +0200
Subject: [PATCH 2/3] app: maintain list of pthread_t belonging to app

We can track: pthread, pthread_t or sched::thread.
pthread_t was picked mainly because osv::application already contains
pthread_t _thread (the main app thread). Plus, pthread_t can be converted
to pthread, and pthread contains ptr to sched::thread. I'm not sure how to
convert sched::thread to pthread(_t).

Could be problem: app might create threads which are not pthread_t - those
are not tracked.

Signed-off-by: Justin Cinkelj <[email protected]>
---
 core/app.cc        | 38 ++++++++++++++++++++++++++++++++++++++
 include/osv/app.hh |  6 ++++++
 libc/pthread.cc    | 14 ++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/core/app.cc b/core/app.cc
index 2c25240..0806f44 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -36,6 +36,9 @@ extern "C" void __libc_start_main(int (*main)(int, char**), int, char**,
     pthread_exit(nullptr);
 }
 
+// Util function from pthread.cc.
+sched::thread& pthread_to_thread(pthread_t p);
+
 namespace osv {
 
 __thread application* override_current_app;
@@ -467,6 +470,41 @@ void application::merge_in_environ(bool new_program,
     }
 }
 
+void application::add_thread(pthread_t thread)
+{
+    /*
+    The first thread in application executes main, but osv::application::get_current()
+    at that time points to its "parent" (say to httpserver.so when executing /some_user_app.so via cli)
+    At that time, override_current_app points to the "right" app.
+    Otherwise, override_current_app is NULL, and get_current() is valid.
+
+    Also, some OS threads (like some very first pthread-s, or which are not started by loading some elf file) don't have app.
+    So both override_current_app and get_current().get() can be NULL.
+    */
+    application* app =  override_current_app ? override_current_app: get_current().get();
+    WITH_LOCK(app_list_lock) {
+        if (app) {
+            app->_my_threads.push_back(thread);
+        }
+    }
+}
+
+void application::remove_thread(pthread_t thread)
+{
+    WITH_LOCK(app_list_lock) {
+        std::vector<application*>::iterator it;
+        for(it = app_list.begin(); it < app_list.end(); it++) {
+            application* app = *it;
+            assert(app);
+            auto element = std::find(app->_my_threads.begin(), app->_my_threads.end(), thread);
+            if (element != app->_my_threads.end()) {
+                app->_my_threads.erase(element);
+                return;
+            }
+        }
+    }
+}
+
 namespace this_application {
 
 void on_termination_request(std::function<void()> callback)
diff --git a/include/osv/app.hh b/include/osv/app.hh
index 0cf6e85..eee1450 100644
--- a/include/osv/app.hh
+++ b/include/osv/app.hh
@@ -157,6 +157,9 @@ public:
     std::shared_ptr<elf::object> lib() const { return _lib; }
 
     elf::program *program();
+
+    static void add_thread(pthread_t thread);
+    static void remove_thread(pthread_t thread);
 private:
     void new_program();
     void clone_osv_environ();
@@ -178,6 +181,9 @@ private:
     using main_func_t = int(int, char**);
 
     pthread_t _thread;
+    // List of all pthread_t created by this app.
+    // Some thread created by app might not be pthread_t.
+    std::vector<pthread_t> _my_threads;
     std::unique_ptr<elf::program> _program; // namespace program
     std::vector<std::string> _args;
     std::string _command;
diff --git a/libc/pthread.cc b/libc/pthread.cc
index ad094fc..79709c6 100644
--- a/libc/pthread.cc
+++ b/libc/pthread.cc
@@ -32,6 +32,7 @@
 #include <osv/rwlock.h>
 
 #include "pthread.hh"
+#include "osv/app.hh"
 
 namespace pthread_private {
 
@@ -107,6 +108,9 @@ namespace pthread_private {
     void pthread::start()
     {
         _thread.start();
+
+        // Add this thread to list of app threads.
+        osv::application::add_thread(to_libc());
     }
 
     sched::thread::attr pthread::attributes(thread_attr attr)
@@ -140,6 +144,8 @@ namespace pthread_private {
 
     int pthread::join(void** retval)
     {
+        // There is no ~pthread destructor. Is this second-best place to remove thread from list?
+        osv::application::remove_thread(to_libc());
         _thread.join();
         if (retval) {
             *retval = _retval;
@@ -173,6 +179,14 @@ namespace pthread_private {
 
 using namespace pthread_private;
 
+/*
+Util function to get underlying sched::thread.
+Used in app.cc.
+*/
+sched::thread& pthread_to_thread(pthread_t p) {
+    return pthread::from_libc(p)->_thread;
+}
+
 int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
         void *(*start_routine) (void *), void *arg)
 {
-- 
2.5.0

>From bb3176d68eeb22e27b9ef3028c7528370cf50454 Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:34:58 +0200
Subject: [PATCH 1/3] app: maintain list of all applications

Signed-off-by: Justin Cinkelj <[email protected]>
---
 core/app.cc        | 11 +++++++++++
 include/osv/app.hh |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/core/app.cc b/core/app.cc
index 7fa76a7..2c25240 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -86,6 +86,8 @@ application_runtime::~application_runtime()
 }
 
 app_registry application::apps;
+std::vector<application*> application::app_list;
+mutex application::app_list_lock;
 
 shared_app_t application::get_current()
 {
@@ -176,6 +178,9 @@ application::application(const std::string& command,
     if (!_entry_point && !_main) {
         throw launch_error("Failed looking up main");
     }
+    WITH_LOCK(app_list_lock) {
+        app_list.push_back(this);
+    }
 }
 
 void application::start()
@@ -199,6 +204,12 @@ TRACEPOINT(trace_app_destroy, "app=%p", application*);
 application::~application()
 {
     assert(_runtime.use_count() == 1 || !_runtime);
+    WITH_LOCK(app_list_lock) {
+        auto it = std::find(app_list.begin(), app_list.end(), this);
+        assert(it != app_list.end());
+        std::swap(*it, app_list.back());
+        app_list.pop_back();
+    }
     trace_app_destroy(this);
 }
 
diff --git a/include/osv/app.hh b/include/osv/app.hh
index 43612e6..0cf6e85 100644
--- a/include/osv/app.hh
+++ b/include/osv/app.hh
@@ -189,6 +189,8 @@ private:
     main_func_t* _main;
     void (*_entry_point)();
     static app_registry apps;
+    static std::vector<application*> app_list;
+    static mutex app_list_lock;
 
     // Must be destroyed before _lib, because contained function objects may
     // have destructors which are part of the application's code.
-- 
2.5.0

Reply via email to