Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Fri, Sep 18, 2009 at 03:22:22PM -0700, Ira Weiny wrote:
> > main()
> > {
> > foo = libibnetdisc_setup();
> > libibnetdisc_discover_all(foo,res);
> > // Do interesting things with res.
> > }
>
> That is the current use case. However I can see use cases were discover is
> called periodically to get a new snapshot of the fabric. Also since the
> discover can scan parts of the fabric ("libibnetdisc_discover_part") and
> return a fabric which represents pieces of the whole I could see "fabric"
> operations such as merge, update, and replace.
Sure, the way I've approached this in the past is that the fabric
description is stored as a directect graph, and the usual set of graph
manipulation primitives (BFS, difference, join, splice, etc) are
available to work on it. This makes alot of the stuff people want to
do expressable via quite natural graph concepts.
> > Sure, the entire library is not thread safe around the ibmad_port
> > context. But who cares? If the caller to libibnetdisc wants to thread
> > that way they need to open another context.
>
> Yes, they can but how do they know they need to do this?
> Furthermore how many context's are required?
Well, that is a doc question right? In C - no metion of threading in
docs == not thread safe.
> The bottom line is I wanted multiple outstanding queries. I am not
> going to open a context for each query. The amount of code required
> to process and sort Transaction ID's should be provided by libibmad
> or a layer at that level. It should not be required for every user
> process or user lib. Furthermore my prototype code does not support
> redirect. Therefore it makes the code even more difficult. Why
> make every user suffer this problem?
The transaction ID to FD sorting code is provided in the kernel. If
someone wants threads they really want TID to thread mapping so that a
synchronous control flow is prossible:
madSet(foo,value); // Sends a MAD, then blocks on a recieve for a TID
match'd reply
This is why it is unsuitable for libumad to do any kind of threading
how does it handle multiplexing access to the FD from multiple threads
without a huge, huge mess internally?
> I am a bit confused. Do you mean to open multiple umad fds such that the
> kernel will do the TID based dispatch for you? Or are you suggesting a
> different kernel umad implementation?
Yes, that is what I am suggesting.
Every thread you create gets a private FD and a private mad
context. The mad layer is not threaded (beyond being re-entrant). Each
thread sends and blocks on the thread specific FD and the kernel
multiplexes transmits and sorts replies to direct them to the proper
waiting thread.
Anything else is a big can of worms.
But, single threaded event FSM based is almost always dramatically
simpler and faster. But even so, it would work the same way with
threads, each thread gets an thread specific FSM context.
> > Well, the very best way to do this is to have a FSM engine API at the
> > core of the MAD libary:
> > mad_ctx->callback = done_this;
> > mad_post(mad,mad_ctx)
> >
> > done_this(reply):
> > ...
>
> Which way do you propose to do this, have a thread calling "done_this" or
> having the user call an event loop?
Look at something like glib to see how this generally works.. but yes,
this is done with a top level poll event loop waiting on the umad
fd. A reasonable goal would to have an FSM interface API that could
be plugged into glib easially.
This gives you mad level parallelism without threads.
> I see some things of value in libibmad. However, I have been reluctant to use
> it in the past and I agree it needs fixing. I don't want to reinvent the
> wheel but perhaps that is what needs to be done...
I'm just about to the point where I need something alot better for the
little app I'm working on - I want to setup multipath IB connections,
which means sophisticated PR queries..
So, I'd like to see this fixed up too, and I can probably work on a
few things. We can donate our structure parsing codegen framework
which is dramatically better than what libibmad uses today.
I'm specifically interested in GMP's for PR queries, but in much the
same infrastructure covers both SMPs and GMPs.
What I'd like is a nice uniform language across libibcm and this the
MAD library so I can just do a PR, get the result, pass it to the CM
and up into the kernel without huge app specific code to do all
that. Nothing like that exists right now and it sucks. Very hard to
write IB apps.
Jason
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Thu, 27 Aug 2009 12:20:56 -0600
Jason Gunthorpe wrote:
> On Thu, Aug 27, 2009 at 09:48:10AM -0700, Ira Weiny wrote:
>
> > > FSM multiplexing the recv path usually gives much better performance,
> > > something like net discovery is quite easy..
> >
> > Using the original algorithm and data structures lended itself to
> > threading. Now that I am neck deep in all this I have thought that
> > rewriting it all might be easier.
>
> Yah. mayhaps..
>
> > > main loop:
> > > fill tx queue from next list
> > > recieve replies and correlate with next list
>
> > This would still need additional code (or additional synchronization in the
> > API to libibnetdisc) if you wanted a user app to be multi-threaded. Someone
> > has to be in charge of receiving all replies on that ibmad_port object and
> > handing them to the proper owner. Of course one could open multiple
> > ibmad_port objects but how is the app writer to know to do that? Digging
> > through the code to find out that libibnetdisc is consuming all the replies?
>
> What is the use case here? I thought the app would be something like:
>
> main()
> {
> foo = libibnetdisc_setup();
> libibnetdisc_discover_all(foo,res);
> // Do interesting things with res.
> }
That is the current use case. However I can see use cases were discover is
called periodically to get a new snapshot of the fabric. Also since the
discover can scan parts of the fabric ("libibnetdisc_discover_part") and
return a fabric which represents pieces of the whole I could see "fabric"
operations such as merge, update, and replace.
>
> Where the goal is to have libibnetdisc_discover_all complete
> expediently.
>
> As long as the context 'foo' is re-entrant in all ways with all other
> libraries and contexts I think useful threaded apps can be created.
Yes absolutely. However, my current issue is with making ibmad_port thread
safe so that libibnetdisc_discover can be multithreaded. I have been able to
do so but the amount of code it took seems unreasonable to force upon any
users ob libibmad.
>
> > This is what got me on this in the first place. smp_query_via
> > (_do_madrpc) is not thread safe.
>
> Sure, the entire library is not thread safe around the ibmad_port
> context. But who cares? If the caller to libibnetdisc wants to thread
> that way they need to open another context.
Yes, they can but how do they know they need to do this? Furthermore how many
context's are required? The bottom line is I wanted multiple outstanding
queries. I am not going to open a context for each query. The amount of code
required to process and sort Transaction ID's should be provided by libibmad
or a layer at that level. It should not be required for every user process or
user lib. Furthermore my prototype code does not support redirect. Therefore
it makes the code even more difficult. Why make every user suffer this
problem?
>
> > Also, I feel that someone down the road might fall into the same
> > trap that I did thinking that smp_query_via is thread safe and I
> > would like to fix that.
>
> Well.. How can it be threaded? umad_send/umad_recv are inherently
> single threaded APIs. You have to layer a TID based threading dispatch
> mechanism on top of it. Much better to let the kernel do that and open
> multiple umad fds.
I am a bit confused. Do you mean to open multiple umad fds such that the
kernel will do the TID based dispatch for you? Or are you suggesting a
different kernel umad implementation?
>
> > > each entry:
> > > add to next list additional ports
> > >
> > > Repeat until dead.
> > >
> > > Where a 'next list' would be a set of actions along the lines of
> > > 'query node' or 'query port' the action on a 'query node' completion
> > > is to generate 'query port' next list items for all the ports, and on
> > > 'query port' completion is to generate 'query node' items for all
> > > enabled ports..
> > >
> > > libumad is nonblocking, parallel, etc...
> >
> > Yes, and libibmad layers on top of it an easier interface to issue common
> > queries. Why should we ask the user to re-implement that code?
>
> Well, the very best way to do this is to have a FSM engine API at the
> core of the MAD libary:
> mad_ctx->callback = done_this;
> mad_post(mad,mad_ctx)
>
> done_this(reply):
> ...
Which way do you propose to do this, have a thread calling "done_this" or
having the user call an event loop?
>
> > For example, mad_rpc now handles redirection. My implementation
> > does not yet. So now I have to handle that on my own as well...
> > :-(
>
> To be honest, I don't like the libibmad/libibumad APIs one bit - I'm
> not surprised they don't work for you..
>
> Frankly, we really need a usable MAD libary with sane APIs, and very
> high level APIs on top of that. You cannot make an IB application
> without doing SA queries at a minimum and the current process is
> HORRID.
>
> I see nothing of value in libimad and libibumad to support that :|
I see so
Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Thu, Aug 27, 2009 at 09:48:10AM -0700, Ira Weiny wrote:
> > FSM multiplexing the recv path usually gives much better performance,
> > something like net discovery is quite easy..
>
> Using the original algorithm and data structures lended itself to
> threading. Now that I am neck deep in all this I have thought that
> rewriting it all might be easier.
Yah. mayhaps..
> > main loop:
> > fill tx queue from next list
> > recieve replies and correlate with next list
> This would still need additional code (or additional synchronization in the
> API to libibnetdisc) if you wanted a user app to be multi-threaded. Someone
> has to be in charge of receiving all replies on that ibmad_port object and
> handing them to the proper owner. Of course one could open multiple
> ibmad_port objects but how is the app writer to know to do that? Digging
> through the code to find out that libibnetdisc is consuming all the replies?
What is the use case here? I thought the app would be something like:
main()
{
foo = libibnetdisc_setup();
libibnetdisc_discover_all(foo,res);
// Do interesting things with res.
}
Where the goal is to have libibnetdisc_discover_all complete
expediently.
As long as the context 'foo' is re-entrant in all ways with all other
libraries and contexts I think useful threaded apps can be created.
> This is what got me on this in the first place. smp_query_via
> (_do_madrpc) is not thread safe.
Sure, the entire library is not thread safe around the ibmad_port
context. But who cares? If the caller to libibnetdisc wants to thread
that way they need to open another context.
> Also, I feel that someone down the road might fall into the same
> trap that I did thinking that smp_query_via is thread safe and I
> would like to fix that.
Well.. How can it be threaded? umad_send/umad_recv are inherently
single threaded APIs. You have to layer a TID based threading dispatch
mechanism on top of it. Much better to let the kernel do that and open
multiple umad fds.
> > each entry:
> > add to next list additional ports
> >
> > Repeat until dead.
> >
> > Where a 'next list' would be a set of actions along the lines of
> > 'query node' or 'query port' the action on a 'query node' completion
> > is to generate 'query port' next list items for all the ports, and on
> > 'query port' completion is to generate 'query node' items for all
> > enabled ports..
> >
> > libumad is nonblocking, parallel, etc...
>
> Yes, and libibmad layers on top of it an easier interface to issue common
> queries. Why should we ask the user to re-implement that code?
Well, the very best way to do this is to have a FSM engine API at the
core of the MAD libary:
mad_ctx->callback = done_this;
mad_post(mad,mad_ctx)
done_this(reply):
...
> For example, mad_rpc now handles redirection. My implementation
> does not yet. So now I have to handle that on my own as well...
> :-(
To be honest, I don't like the libibmad/libibumad APIs one bit - I'm
not surprised they don't work for you..
Frankly, we really need a usable MAD libary with sane APIs, and very
high level APIs on top of that. You cannot make an IB application
without doing SA queries at a minimum and the current process is
HORRID.
I see nothing of value in libimad and libibumad to support that :|
Jason
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Wed, 26 Aug 2009 18:24:20 -0600 Jason Gunthorpe wrote: > On Wed, Aug 26, 2009 at 04:40:26PM -0700, Ira Weiny wrote: > > > Of course! :-) But first I would like to mention some numbers from the > > prototype code I have. When running on a small fabric the additional > > overhead > > of thread creation actually slows down the scan. :-( > > It seems strange to me to thread something like this (and alot of hard > work).. > > FSM multiplexing the recv path usually gives much better performance, > something like net discovery is quite easy.. Using the original algorithm and data structures lended itself to threading. Now that I am neck deep in all this I have thought that rewriting it all might be easier. > main loop: > fill tx queue from next list > recieve replies and correlate with next list This would still need additional code (or additional synchronization in the API to libibnetdisc) if you wanted a user app to be multi-threaded. Someone has to be in charge of receiving all replies on that ibmad_port object and handing them to the proper owner. Of course one could open multiple ibmad_port objects but how is the app writer to know to do that? Digging through the code to find out that libibnetdisc is consuming all the replies? This is what got me on this in the first place. smp_query_via (_do_madrpc) is not thread safe. Threading was the easy way to deal with multiple blocking queries on the fabric. Changing _do_madrpc to be thread safe allowed a very quick multithreaded implementation on top of the current algorithm which blocked on multiple queries. I did not have to form the queries myself, it was easy... (I had that working months ago.) Given that we don't want to change libibmad things got more complicated and your algorithm seems much better... (except [see below]) Also, I feel that someone down the road might fall into the same trap that I did thinking that smp_query_via is thread safe and I would like to fix that. > > each entry: > add to next list additional ports > > Repeat until dead. > > Where a 'next list' would be a set of actions along the lines of > 'query node' or 'query port' the action on a 'query node' completion > is to generate 'query port' next list items for all the ports, and on > 'query port' completion is to generate 'query node' items for all > enabled ports.. > > libumad is nonblocking, parallel, etc... Yes, and libibmad layers on top of it an easier interface to issue common queries. Why should we ask the user to re-implement that code? For example, mad_rpc now handles redirection. My implementation does not yet. So now I have to handle that on my own as well... :-( Ira > > Jason -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Wed, Aug 26, 2009 at 04:40:26PM -0700, Ira Weiny wrote: > Of course! :-) But first I would like to mention some numbers from the > prototype code I have. When running on a small fabric the additional overhead > of thread creation actually slows down the scan. :-( It seems strange to me to thread something like this (and alot of hard work).. FSM multiplexing the recv path usually gives much better performance, something like net discovery is quite easy.. main loop: fill tx queue from next list recieve replies and correlate with next list each entry: add to next list additional ports Repeat until dead. Where a 'next list' would be a set of actions along the lines of 'query node' or 'query port' the action on a 'query node' completion is to generate 'query port' next list items for all the ports, and on 'query port' completion is to generate 'query node' items for all enabled ports.. libumad is nonblocking, parallel, etc... Jason ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
On Sun, 23 Aug 2009 15:06:09 +0300 Sasha Khapyorsky wrote: > Hi Ira, > > On 08:30 Mon 17 Aug , Ira Weiny wrote: > > > > The immediate benefit is coming with the multi-threaded implementation where > > I plan on adding the following function.[*] > > Ok, but could we discuss first how will multithreading architecture be Of course! :-) But first I would like to mention some numbers from the prototype code I have. When running on a small fabric the additional overhead of thread creation actually slows down the scan. :-( Current master: Threaded version: real0m0.101s 0m0.266s user0m0.000s 0m0.000s sys 0m0.011s 0m0.014s But, as expected, on a large system (1152 nodes) there is a decent speed up. Current Master: Threaded version: real0m3.046s 0m1.748s user0m0.073s 0m0.331s sys 0m0.158s 0m0.822s However, the biggest speed up comes when there are errors on the fabric. This is the same 1152 node cluster with just 14 "bad" ports on the fabric. This is of course because the scan continues "around" the bad ports. Current Master: Threaded version: real0m33.051s0m5.609s user0m0.071s 0m0.353s sys 0m0.156s 0m1.113s Since you are usually running these tools when things are bad I think there is a big gain here. Even running with a faster timeout of 200ms results in a big difference. Current Master:Threaded version: real0m9.149s0m2.223s user0m0.016s0m0.374s sys 0m0.372s0m1.056s With that in mind... > implemented with libibnetdisc: goals (in particular is it support for > multithreaded apps or just multithreaded discovery function), interaction > with caller application, etc.? My initial goal was to make the libibnetdisc safe for multithreaded apps and make a multithreaded discovery function. However, since libibmad itself is not thread safe, and you expressed a desire to keep it that way[*], I reduced that goal to just making the discovery function multithreaded (using mad_[send|receive]_via). Although I don't like this restriction I can see it as a valid design decision as long as it is documented that the discover function is not thread safe in regards to the ibmad_port object. This is because the ibnd_discover_fabric uses libibmad calls and would require a complicated API to allow the user app to synchronize with those calls. In order to make things thread safe for the user apps as well as the library I can see 3 options. 1) make libibmad thread safe (which you were hesitant to do) 2) add a thread safe interface to libibmad. User apps will need to know to use this interface while using libibnetdisc and libibnetdisc will use this interface. 3) Create a wrapper lib which is thread safe. In this case the apps and libibnetdisc would call into this wrapper lib and we would have to change the API to libibnetdisc. Right now I have the multithreaded discover code separated out somewhat. I think it would not be hard to extract the multithreaded parts and either create the wrapper lib or extend libibmad with thread safe calls. That said, I personally do not like option 2. I think it further complicates an already overly complex API in libibmad. As far as option 1 vs 3 I can see arguments for and against each. 1 makes things very nice because it would be taken care of for all apps currently using libibmad. On the down side it would add some overhead for single threaded apps. Although I do not believe too much.[$] The downside of 3 is that to be done correctly it would change the libibnetdisc API and apps which use it. > > One of the desired feature of this I could think would be to keep API > simple for single threaded stuff. Agreed. I don't think the API is going to get to complicated. A big reason for adding the context is to allow the API to be flexible without breaking things. Ira [*] http://lists.openfabrics.org/pipermail/general/2009-July/060677.html "madrpc() is too primitive interface for such applications. There would be better to use umad_send/recv() directly or may be mad_send_via(). Example is mcast_storm.c distributed with ibsim." [$] It is my opinion that mad_rpc is _not_ primitive. In my mind it _is_ the wrapper around the primitive umad_send/recv calls. If you are interested perhaps I can try to explain what I wanted to do in the library to make it thread safe more clearly. The point I might not have made clear was that I don't think the library will have to do any threading on it's own, just some locks and storing of responses. Of course the down side to this is the libibmad code would be slightly slower. But I don't think by very much. -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] ___ general mailing list general@
