On 07/03/2016 08:52 AM, Nadav Har'El wrote:

On Thu, Jun 30, 2016 at 9:18 PM, Justin Cinkelj <[email protected] <mailto:[email protected]>> wrote:

    I tried to do that, and sent patch. I added comment that it might
    be possible that thread returned by find_by_id() get deleted while
    we are using it. It unlikely/rare etc, but just to mention.


This is true... There's a way to fix that, by holding thread_map_mutex while using the thread. We could replace find_by_id() by a method which takes an std::function and runs this function with the mutex held.

HOWEVER, I'm not sure this is a real problem; Consider that we have dozens of functions which take a sched::thread pointer (not an integer id), and most of them will crash if the thread dies in the middle of their work. So I'm not sure it's very important to fix the specific problem of this getaffinity function.


    Another little detail - std::unordered_map<unsigned long, thread
    *> thread_map used ULONG, but thread._id (and pid_t) are UINT - 8
    vs 4 B. Maybe should that be unified? Or is there a reason for
    ULONG usage?


I think this was for historic reasons - historically OSv had an "unsigned long" for thread ids, because we anticipated someone might want to go through more than 4 billlion short-lived threads during the lifetime of the VM.

However, at some point we wanted to be more Linux compatible, with Linux having 4-byte process ids, so we switched to having an "unsigned int" _id for threads. The constructor thread::thread() now needs to wrap around after finishing 4 billion ids, and needs to check the next id if not already in use (which we didn't need to do when having 8-byte ids and never wrapping around).

So you're right, thread_map's key should be unsigned int, NOT unsigned long. Thanks for noticing, I'll make a patch for that.
Thank you for explaining. I wasn't sure if long vs int has some purpose or is just an 'coincidence'.



    However, if you also need sched_setaffinity(), this is more
    difficult. Pinning the current thread, and pinning some other
    random thread, have conceptual differences (e.g., consider
    pinning a thread which is already running right now on the wrong
    CPU), and the latter hasn't been done yet. If you need it, please
    checkout https://github.com/cloudius-systems/osv/issues/520 and
    perhaps explain there why you need it.
    Yes, that's on whishlist too. I will read links, and hopefully
    understand some bits.


Doing sched_setaffinity() for other threads will be considerably more tricky than the other small changes you did here, so I wonder if it's really necessary: Can't each thread pin itself? That we already support.
It's a bit complicated I guess. The simplest Open MPI app doesn't start threads on its own, and for those I can add code at start of each thread routine. The problem is, at that time I don't know to which cpu thread should be pinned (we start communication thread, and info about required pinning comes later over TCP communication channel).

More complex Open MPI apps start additional threads, often even before their Open MPI part gets initialized. For that case, all old threads are first pinned. The one created later are probably pinned as whole process is pinned (I guess its so). The 'pin all old threads' seems to be very 'intentional' - code loops over all thread a few times to recheck/re-pin them, to check if new threads as created just at the current moment etc; I guess they had quite some problems with some threads randomly slipping unnoticed from pinning part.

The OpenFOAM so far goes under the simple app case;

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to