@nadav, asking for advice/guidance.
One thing Open MPI does is sched_{get,set}affinity over all threads of
current process. This is difficult in OSv as:
- there is only one process, and "all threads" is to wide then
- I even can't get list of all threads belonging to Open MPI (or,
after noticing application::unsafe_stop_and_aband()
https://github.com/cloudius-systems/osv/blob/master/core/app.cc#L106, I
can, but it is a bit difficult)
- sched_getaffinity accepts only pid==0. I will send little patch to
accept also pid==current_tid. But later, it should also accept other
tids - will be that a problem?
To me, it seems that OSv equivalent of linux process is osv::application
(ELF namespace can contain more than one app; and app_runtime has 1:1
relationship with application I think). I would like to add (did) add
list to track threads belonging to each app. For thread, there are
sched::thread, pthread and pthread_t (and pthread pthread_t have 1:1:
relationship too - to_libc(), from_libc()). osv::application already
stores its primary thread in _thread, type pthread_t. So I choose to
save list of pthread_t. But there are also sched::thread without
corresponding pthread_t. Not a problem for me, but worth mentioning.
Maybe list of pthread_t, stored in each app, is not needed and
application::unsafe_stop_and_aband() approach should be used.
Open MPI gets list of process threads from /proc/PID/task/TID. Should be
that implemented? Or should be added an additional C function to just
store TIDs in an array?
I attached my current patch just to illustrate what I added so far.
Only later I realized that sched::thread has ptr to app_runtime, so
there is no need to store list of all applications. I used app_list only
to figure out application owning the currently destroyed thread.
BR Justin
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
>From 9115916ab8d4b4f44370da623976f4768f3f650d Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:10:11 +0200
Subject: [PATCH 3/3] app_list, app-my_threads list, TID alloc/free debug
printf
Signed-off-by: Justin Cinkelj <[email protected]>
---
core/app.cc | 14 ++++++++++++++
core/sched.cc | 2 ++
2 files changed, 16 insertions(+)
diff --git a/core/app.cc b/core/app.cc
index 0806f44..57dc52d 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -151,6 +151,7 @@ application::application(const std::string& command,
, _joiner(nullptr)
, _terminated(false)
{
+ fprintf(stderr, "APP CTOR: app=%p cmd=%s new_program=%d\n", this, command.c_str(), new_program);
try {
elf::program *current_program;
@@ -183,6 +184,7 @@ application::application(const std::string& command,
}
WITH_LOCK(app_list_lock) {
app_list.push_back(this);
+ fprintf(stderr, "APP app_list add app %p %s, new app_list.size=%d\n", this, this->_command.c_str(), app_list.size());
}
}
@@ -209,6 +211,7 @@ application::~application()
assert(_runtime.use_count() == 1 || !_runtime);
WITH_LOCK(app_list_lock) {
auto it = std::find(app_list.begin(), app_list.end(), this);
+ fprintf(stderr, "APP app_list removing app %p %s, old app_list.size=%d\n", this, this->_command.c_str(), app_list.size());
assert(it != app_list.end());
std::swap(*it, app_list.back());
app_list.pop_back();
@@ -486,6 +489,11 @@ void application::add_thread(pthread_t thread)
if (app) {
app->_my_threads.push_back(thread);
}
+ if (1) {
+ sched::thread &t = pthread_to_thread(thread);
+ const char* app_cmd = app? app->_command.c_str(): NULL;
+ fprintf(stderr, "APP add_thread thread=%d, TID=%d, app=%p %s, app_over=%p app_cur=%p app_list.size = %d\n", thread, t.id(), app, app_cmd, override_current_app, get_current().get(), app_list.size());
+ }
}
}
@@ -493,16 +501,22 @@ void application::remove_thread(pthread_t thread)
{
WITH_LOCK(app_list_lock) {
std::vector<application*>::iterator it;
+ fprintf(stderr, "APP rem_thread thread=%d app_list.size=%d\n", thread, app_list.size());
for(it = app_list.begin(); it < app_list.end(); it++) {
application* app = *it;
assert(app);
+ //const char* app_cmd = app->_command.c_str();
+ //fprintf(stderr, "APP rem_thread app==%p %s\n", app, app_cmd);
auto element = std::find(app->_my_threads.begin(), app->_my_threads.end(), thread);
if (element != app->_my_threads.end()) {
+ //sched::thread &t = pthread_to_thread(thread);
+ //fprintf(stderr, "APP rem_thread thread=%d, TID=%d, app=%p %s, app_over=%p app_cur=%p\n", thread, t.id(), app, app_cmd, override_current_app, get_current().get());
app->_my_threads.erase(element);
return;
}
}
}
+ fprintf(stderr, "APP rem_thread thread=%d, TID=NOT-FOUND at_all app_list.size=%d\n", thread, app_list.size());
}
namespace this_application {
diff --git a/core/sched.cc b/core/sched.cc
index 23d5d92..0f40326 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -801,6 +801,7 @@ thread::thread(std::function<void ()> func, attr attr, bool main, bool app)
tid = 1;
}
if (!find_by_id(tid)) {
+ fprintf(stderr, "THR alloc TID %d\n", tid);
_s_idgen = _id = tid;
thread_map.insert(std::make_pair(_id, this));
break;
@@ -887,6 +888,7 @@ thread::~thread()
}
WITH_LOCK(thread_map_mutex) {
thread_map.erase(_id);
+ fprintf(stderr, "THR freed TID %d\n", _id);
total_app_time_exited += _total_cpu_time;
}
if (_attr._stack.deleter) {
--
2.5.0
>From b31b77c84caa0c9440d0e9bb91f16871beb93b5e Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:30:35 +0200
Subject: [PATCH 2/3] app: maintain list of pthread_t belonging to app
We can track: pthread, pthread_t or sched::thread.
pthread_t was picked mainly because osv::application already contains
pthread_t _thread (the main app thread). Plus, pthread_t can be converted
to pthread, and pthread contains ptr to sched::thread. I'm not sure how to
convert sched::thread to pthread(_t).
Could be problem: app might create threads which are not pthread_t - those
are not tracked.
Signed-off-by: Justin Cinkelj <[email protected]>
---
core/app.cc | 38 ++++++++++++++++++++++++++++++++++++++
include/osv/app.hh | 6 ++++++
libc/pthread.cc | 14 ++++++++++++++
3 files changed, 58 insertions(+)
diff --git a/core/app.cc b/core/app.cc
index 2c25240..0806f44 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -36,6 +36,9 @@ extern "C" void __libc_start_main(int (*main)(int, char**), int, char**,
pthread_exit(nullptr);
}
+// Util function from pthread.cc.
+sched::thread& pthread_to_thread(pthread_t p);
+
namespace osv {
__thread application* override_current_app;
@@ -467,6 +470,41 @@ void application::merge_in_environ(bool new_program,
}
}
+void application::add_thread(pthread_t thread)
+{
+ /*
+ The first thread in application executes main, but osv::application::get_current()
+ at that time points to its "parent" (say to httpserver.so when executing /some_user_app.so via cli)
+ At that time, override_current_app points to the "right" app.
+ Otherwise, override_current_app is NULL, and get_current() is valid.
+
+ Also, some OS threads (like some very first pthread-s, or which are not started by loading some elf file) don't have app.
+ So both override_current_app and get_current().get() can be NULL.
+ */
+ application* app = override_current_app ? override_current_app: get_current().get();
+ WITH_LOCK(app_list_lock) {
+ if (app) {
+ app->_my_threads.push_back(thread);
+ }
+ }
+}
+
+void application::remove_thread(pthread_t thread)
+{
+ WITH_LOCK(app_list_lock) {
+ std::vector<application*>::iterator it;
+ for(it = app_list.begin(); it < app_list.end(); it++) {
+ application* app = *it;
+ assert(app);
+ auto element = std::find(app->_my_threads.begin(), app->_my_threads.end(), thread);
+ if (element != app->_my_threads.end()) {
+ app->_my_threads.erase(element);
+ return;
+ }
+ }
+ }
+}
+
namespace this_application {
void on_termination_request(std::function<void()> callback)
diff --git a/include/osv/app.hh b/include/osv/app.hh
index 0cf6e85..eee1450 100644
--- a/include/osv/app.hh
+++ b/include/osv/app.hh
@@ -157,6 +157,9 @@ public:
std::shared_ptr<elf::object> lib() const { return _lib; }
elf::program *program();
+
+ static void add_thread(pthread_t thread);
+ static void remove_thread(pthread_t thread);
private:
void new_program();
void clone_osv_environ();
@@ -178,6 +181,9 @@ private:
using main_func_t = int(int, char**);
pthread_t _thread;
+ // List of all pthread_t created by this app.
+ // Some thread created by app might not be pthread_t.
+ std::vector<pthread_t> _my_threads;
std::unique_ptr<elf::program> _program; // namespace program
std::vector<std::string> _args;
std::string _command;
diff --git a/libc/pthread.cc b/libc/pthread.cc
index ad094fc..79709c6 100644
--- a/libc/pthread.cc
+++ b/libc/pthread.cc
@@ -32,6 +32,7 @@
#include <osv/rwlock.h>
#include "pthread.hh"
+#include "osv/app.hh"
namespace pthread_private {
@@ -107,6 +108,9 @@ namespace pthread_private {
void pthread::start()
{
_thread.start();
+
+ // Add this thread to list of app threads.
+ osv::application::add_thread(to_libc());
}
sched::thread::attr pthread::attributes(thread_attr attr)
@@ -140,6 +144,8 @@ namespace pthread_private {
int pthread::join(void** retval)
{
+ // There is no ~pthread destructor. Is this second-best place to remove thread from list?
+ osv::application::remove_thread(to_libc());
_thread.join();
if (retval) {
*retval = _retval;
@@ -173,6 +179,14 @@ namespace pthread_private {
using namespace pthread_private;
+/*
+Util function to get underlying sched::thread.
+Used in app.cc.
+*/
+sched::thread& pthread_to_thread(pthread_t p) {
+ return pthread::from_libc(p)->_thread;
+}
+
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg)
{
--
2.5.0
>From bb3176d68eeb22e27b9ef3028c7528370cf50454 Mon Sep 17 00:00:00 2001
From: Justin Cinkelj <[email protected]>
Date: Wed, 29 Jun 2016 18:34:58 +0200
Subject: [PATCH 1/3] app: maintain list of all applications
Signed-off-by: Justin Cinkelj <[email protected]>
---
core/app.cc | 11 +++++++++++
include/osv/app.hh | 2 ++
2 files changed, 13 insertions(+)
diff --git a/core/app.cc b/core/app.cc
index 7fa76a7..2c25240 100644
--- a/core/app.cc
+++ b/core/app.cc
@@ -86,6 +86,8 @@ application_runtime::~application_runtime()
}
app_registry application::apps;
+std::vector<application*> application::app_list;
+mutex application::app_list_lock;
shared_app_t application::get_current()
{
@@ -176,6 +178,9 @@ application::application(const std::string& command,
if (!_entry_point && !_main) {
throw launch_error("Failed looking up main");
}
+ WITH_LOCK(app_list_lock) {
+ app_list.push_back(this);
+ }
}
void application::start()
@@ -199,6 +204,12 @@ TRACEPOINT(trace_app_destroy, "app=%p", application*);
application::~application()
{
assert(_runtime.use_count() == 1 || !_runtime);
+ WITH_LOCK(app_list_lock) {
+ auto it = std::find(app_list.begin(), app_list.end(), this);
+ assert(it != app_list.end());
+ std::swap(*it, app_list.back());
+ app_list.pop_back();
+ }
trace_app_destroy(this);
}
diff --git a/include/osv/app.hh b/include/osv/app.hh
index 43612e6..0cf6e85 100644
--- a/include/osv/app.hh
+++ b/include/osv/app.hh
@@ -189,6 +189,8 @@ private:
main_func_t* _main;
void (*_entry_point)();
static app_registry apps;
+ static std::vector<application*> app_list;
+ static mutex app_list_lock;
// Must be destroyed before _lib, because contained function objects may
// have destructors which are part of the application's code.
--
2.5.0