Hi Hal, 

Sorry for being ambiguous on the answers below. However, I figured out what the 
problem was (while not looking at the code and thinking over it offline.) The 
main mistake was the umad_send part in the while(1) loop. Where I have 
specified the timeout value greater than '0' which means the mads were 
solicited. The SubnAdmResponse should not be sent as solicited and that was the 
main problem. So if I set the timeout value to '0' and the retries count to 
'0', there is no data available for subsequent reads and the 'read' blocks as 
expected. 

Thanks for the help. Some of the clarifications for previous questions are 
below. Please see inline.

On Aug 11, 2006 09:22 AM, Hal Rosenstock <[EMAIL PROTECTED]> wrote:

> Hi Abhijit,
> 
> On Thu, 2006-08-10 at 10:55, Abhijit Gadgil wrote:
> > Hi Hal,
> 
> > I tried using the umad code as per the latest repository. 
> > (The latest fix is on libibumad/umad.c Line # 806 right?) 
> 
> Yes.
> 
> > I manually applied that patch.
> 
> OK but not sure why you did this "manually".
> 

Sorry about this, the machine where I am testing this code does not grab code 
from the svn repository directly, hence I just edited the file with hand. 

> >  It doesn't seem to work yet. 
> 
> What do you mean ? Do you mean that change makes no difference for this
> and you still have the same problem ?
> 
> > Infact, what I figured out was that the 'poll' on the umad->fd isn't 
> > blocking either. 
> 
> What do you mean by either ? 
> 

Well both 'read' and 'poll' were returning immediately because of the 'timeout' 
parameter specified in the umad_send. So even if I specify the timeout to be a 
negative value (in umad_poll), there was a data available always. :-( 

> A poll with an negative timeout should be infinite which means blocking
> so something is happening on the fd but perhaps is not reported
> correctly. This particular usage has not been tried to my knowledge
> although it is used in a similar manner for some other things (by
> OpenSM).
> 
> What kernel version are you using ? Are you using OpenIB from svn or
> OFED or something else ? What version is this up to ?
> 

I am using the latest kernel version 2.6.17 and openIB from svn as well. (same 
revision ie. 8781).

> > The read returns the correct 'mad_agent' ie. 0 in this case and some length 
> > which is usually 24 for the specific code.
> 
> That shows the breakage. Not sure why.
> 
> > I am attaching the local copy of infiniband/include/mad.h and src/fields.c, 
> > so that you may be able to try this code.  (There may be stray printf's in 
> > those files!). Also, since I was not quite clear about whether the 
> > subscriptions should include the RID information (as per section 15.2.5), 
> > so I tried including it first, which the SA doesn't seem to like, but the 
> > subscriptions work after I get rid of the RID header. This particular 
> > aspect is not quite clear to me yet. 
> > 
> > Please let me know what you find.
> 
> I'll try to look at this more tomorrow. I have some other nits on the
> test code you sent. I'll comment on these later as well although I don't
> think they are the crux of the issue.

Please let me know additional comments that you have. 

Further, it is not quite clear from the specification that whether one should 
include the RIDs in the InformInfo records during subscription. What is the 
correct intended behavior?

Regards

-abhijit

> 
> -- Hal
> 
> > Regards.
> > 
> > -abhijit
> > 
> > 
> > On Aug 10, 2006 08:02 PM, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi again Abhijit,
> > > 
> > > On Thu, 2006-08-10 at 09:46, Abhijit Gadgil wrote:
> > > > Hi Hal, 
> > > > 
> > > > Please see below.
> > > > 
> > > > On Aug 10, 2006 07:01 PM, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hi Abhijit,
> > > > > 
> > > > > On Thu, 2006-08-10 at 07:21, Abhijit Gadgil wrote:
> > > > > > Hi All, 
> > > > > > 
> > > > > > I am trying to write a simple program using libibumad to 
> > > > > > 'subscribe' for traps and then receive traps from the SA. Most of 
> > > > > > the things seem to work fine, however I am facing a small problem 
> > > > > > where, after first read for the trap, all subsequent reads are not 
> > > > > > blocking (and return some incorrect length). 
> > > > > 
> > > > > What do those calls return ? What version of management are you using 
> > > > > ? 
> > > > > 
> > > > 
> > > > I am running the management code from the SVN (svn release 8781, it may 
> > > > be slightly outdated!) 
> > > 
> > > A fix just went in to libibumad:umad_recv which may impact your results.
> > > Can you update this and retry ?
> > > 
> > > What do the reads return other than incorrect length ? 
> > > 
> > > -- Hal
> > > 
> > > > > > Attached is the simple code, can someone tell, what exactly is 
> > > > > > wrong out here? 
> > > > > 
> > > > > I didn't build and run this so my comments are based on just looking 
> > > > > at
> > > > > the code. I don't think it would build as there are other changes 
> > > > > needed
> > > > > to support this (e.g. IB_SA_INFINFO_XXX in libibmad at a minimum).
> > > > > 
> > > > 
> > > > Oh I am sorry, I didn't mention this before, I modified the libibmad 
> > > > sources (specifically src/fields.c and include/infiniband/mad.h) files 
> > > > to accomplish this. Once I get it right, I will submit a patch. (It's 
> > > > too hacky right now)
> > > > 
> > > > > Is the main loop based on some operational program ? If so, which one 
> > > > > ?
> > > > > 
> > > > > A couple of specific comments:
> > > > > 
> > > > > init_sa_headers: InformInfo does not actually use RMPP so the
> > > > > initialization here needs to change. Not sure what doing this would
> > > > > cause without actually building and running this.
> > > > > 
> > > > 
> > > > This was my first try of trying to use umad, hence for simplicity I 
> > > > copied from some reference code that was having RMPP enabled. I think I 
> > > > should get rid of this as well. 
> > > > 
> > > > 
> > > > > Based on this, what is the result of the subscription ? Does it really
> > > > > succeed ?
> > > > 
> > > > Well the subscriptions in-deed succeeded and I was able to receive 
> > > > IPoIB broadcast multicast group creation/deletion traps as well, but 
> > > > the problem mentioned below (ie. non-blocking reads) started appearing. 
> > > > 
> > > > > main: Rather than hard coding SM LID to 0x12, there are ways to get 
> > > > > this
> > > > > dynamically. There are examples of how to do this.
> > > > 
> > > > Sorry about this again. I realized it later that it is stupid to hard 
> > > > code it (eg. I could have got it from the ca[].port->sm_lid), will fix 
> > > > that eventually. 
> > > > 
> > > > Thanks.
> > > > 
> > > > -abhijit
> > > > 
> > > > > -- Hal
> > > > > 
> > > > > > Thanks
> > > > > > 
> > > > > > -abhijit
> > 
> > 
> > 
> 




_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to