Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-12 Thread Sean Hefty
Hal Rosenstock wrote:
On Wed, 2004-11-10 at 11:59, Roland Dreier wrote:
   Sean What exactly does it mean then when process_mad returns
   Sean success?  Do any of the return bits from process_mad
   Sean indicate that the MAD was for the HCA driver?
SUCCESS means that process_mad didn't encounter any errors.  If REPLY
or CONSUMED is set then process_mad actually handled the packet.

I would assume that REPLY and CONSUMED are also mutually exclusive.
I believe that's the case, but maybe it would make more sense if they 
weren't, and let CONSUMED indicate that MAD was for the HCA driver.

From an API perspective, I think we only need to know if the HCA driver 
intercepted the MAD, and if so, was a reply generated.

- Sean
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
On Wed, 2004-11-10 at 00:55, Roland Dreier wrote:
 It seems that MAD handling is still not quite right.  It seems in my
 set up that IPoIB is not seeing the response to its MCMember
 set... (it does look like the query is reaching the SM)

This is a separate issue from the ports not becoming active (DR handling
issue). I broke this part yesterday (not a good day at all :-( at either
r1184 and/or r1181 when I added what I thought was correct based on
Sean's emails (not dispatching additional error cases in
ib_mad_recv_done_handler (and then improperly thought I verified the
changes that things were still working)).

I can see now that this is wrong and have a fix for what stops IPoIB
from working. The problem was that the response was received by the MAD
layer but not dispatched due to the change(s) noted above.

So I am patching at least enough to get things operational for now.
Please confirm that it works for you. I will not touch things until I
hear that it does.

Also, it seems to me that no response needs to be handed to process_mad.
Does this optimization make sense ?

Sorry for the temporary inconvenience. I will try not to do this again.
It is no fun for anyone.

-- Hal

mad: In ib_mad_recv_done_handler, if process_mad returns SUCCESS but not
REPLY, received packet still needs to be dispatched

Index: mad.c
===
--- mad.c   (revision 1187)
+++ mad.c   (working copy)
@@ -1151,8 +1151,8 @@
port_priv-device,
port_priv-port_num))
response = NULL;
+   goto out;
}
-   goto out;
} 
}
 



___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Roland Dreier
 Hal == Hal Rosenstock [EMAIL PROTECTED] writes:

Hal I can see now that this is wrong and have a fix for what
Hal stops IPoIB from working. The problem was that the response
Hal was received by the MAD layer but not dispatched due to the
Hal change(s) noted above.

Hal So I am patching at least enough to get things operational
Hal for now.  Please confirm that it works for you. I will not
Hal touch things until I hear that it does.

Yes, IPoIB works for me again.

Hal Also, it seems to me that no response needs to be handed to
Hal process_mad.  Does this optimization make sense ?

I'm not sure I understand the question.  process_mad definitely needs
a buffer to return a response in.  Are you suggesting that process_mad
overwrite the input buffer when it generates a response?  That's
probably OK although I'm not sure if it's much of an improvement
(process_mad will probably have to allocate a response buffer
internally and copy the response when returning).

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
On Wed, 2004-11-10 at 10:36, Roland Dreier wrote:
 Yes, IPoIB works for me again.

Thanks for validating.

 Hal Also, it seems to me that no response needs to be handed to
 Hal process_mad.  Does this optimization make sense ?
 
 I'm not sure I understand the question.  process_mad definitely needs
 a buffer to return a response in.  Are you suggesting that process_mad
 overwrite the input buffer when it generates a response?  That's
 probably OK although I'm not sure if it's much of an improvement
 (process_mad will probably have to allocate a response buffer
 internally and copy the response when returning).

I'm asking about also checking the method prior to calling process_mad. 
If the method is a response method (e.g. GetResp for one), we could
bypass calling process_mad. Or is this not worth the extra checks in the
MAD layer as it is low enough overhead and adds additional protocol
knowledge into the MAD layer ?

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Nitin Hande
Hal Rosenstock wrote:
 On Wed, 2004-11-10 at 00:55, Roland Dreier wrote:
 
It seems that MAD handling is still not quite right.  It seems in my
set up that IPoIB is not seeing the response to its MCMember
set... (it does look like the query is reaching the SM)
 
 
 This is a separate issue from the ports not becoming active (DR handling
 issue). I broke this part yesterday (not a good day at all :-( at either
 r1184 and/or r1181 when I added what I thought was correct based on
 Sean's emails (not dispatching additional error cases in
 ib_mad_recv_done_handler (and then improperly thought I verified the
 changes that things were still working)).
 
 I can see now that this is wrong and have a fix for what stops IPoIB
 from working. The problem was that the response was received by the MAD
 layer but not dispatched due to the change(s) noted above.
 
 So I am patching at least enough to get things operational for now.
 Please confirm that it works for you. I will not touch things until I
 hear that it does.
IPoIB seems to be working for me. I am on x86_64 platform.

Thanks
Nitin



 
 Also, it seems to me that no response needs to be handed to process_mad.
 Does this optimization make sense ?
 
 Sorry for the temporary inconvenience. I will try not to do this again.
 It is no fun for anyone.
 
 -- Hal
 
 mad: In ib_mad_recv_done_handler, if process_mad returns SUCCESS but not
 REPLY, received packet still needs to be dispatched
 
 Index: mad.c
 ===
 --- mad.c (revision 1187)
 +++ mad.c (working copy)
 @@ -1151,8 +1151,8 @@
   port_priv-device,
   port_priv-port_num))
   response = NULL;
 + goto out;
   }
 - goto out;
   } 
   }
  
 
 
 
 ___
 openib-general mailing list
 [EMAIL PROTECTED]
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Sean Hefty
Hal Rosenstock wrote:
This is a separate issue from the ports not becoming active (DR handling
issue). I broke this part yesterday (not a good day at all :-( at either
r1184 and/or r1181 when I added what I thought was correct based on
Sean's emails (not dispatching additional error cases in
ib_mad_recv_done_handler (and then improperly thought I verified the
changes that things were still working)).
What exactly does it mean then when process_mad returns success?  Do any of the 
return bits from process_mad indicate that the MAD was for the HCA driver?

- Sean
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Roland Dreier
Sean What exactly does it mean then when process_mad returns
Sean success?  Do any of the return bits from process_mad
Sean indicate that the MAD was for the HCA driver?

SUCCESS means that process_mad didn't encounter any errors.  If REPLY
or CONSUMED is set then process_mad actually handled the packet.

 - R.

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Roland Dreier
By the way, if I am reading the code correctly, it looks like the MAD
layer only checks for IB_MAD_RESULT_REPLY and not
IB_MAD_RESULT_CONSUMED.  If IB_MAD_RESULT_CONSUMED is set then the
packet is something like a trap repress handled by the SMA or a
locally generated trap that the driver forwarded to the SM, so the
packet should not go through agent dispatch.

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
On Wed, 2004-11-10 at 11:59, Roland Dreier wrote:
 Sean What exactly does it mean then when process_mad returns
 Sean success?  Do any of the return bits from process_mad
 Sean indicate that the MAD was for the HCA driver?
 
 SUCCESS means that process_mad didn't encounter any errors.  If REPLY
 or CONSUMED is set then process_mad actually handled the packet.

I would assume that REPLY and CONSUMED are also mutually exclusive.

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Roland Dreier
Roland I think keeping the MAD code simpler is probably best right now.

Hal Hope that is for technical reasons and not for the recent missteps.

Yes, it's just that the MAD code is quite complicated already with
multiple tests for DR SMPs etc; mad.c alone is over 2000 lines now.  I
don't think you could even find a microbenchmark that could measure
the improvement in testing the response bit in the MAD code rather
than calling into process_mad for every packet, so I don't think we
need to add more code to the MAD layer to do it.

 - R.

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
I haven't cleared the other issues before getting back to this but
wanted to respond to some of the points below:

On Tue, 2004-11-09 at 23:55, Roland Dreier wrote:
 Roland OK, I think I understand the problem, but I'm not sure
 Roland what the correct solution is.  When a DR SMP arrives at a
 Roland CA from the SM, hop_cnt == hop_ptr == number of hops in
 Roland the directed route,
 
 Hal What was the number ?
 
 For one port it was 4 and for another it was 6.  It could really be
 anything (it's just how many hops away the SM is).

I think I understand how DR is supposed to work :-) I was just looking
for the actual values in the failed case to try to understand what is
code was doing as I don't have a configuration to recreate this (at
least yet).

From what you indicated, it looks like it would be the following case
so no response would be sent:

/* C14-13:2 */
if (2 = hop_ptr  hop_ptr = hop_cnt) {
if (node_type != IB_NODE_SWITCH
return 0;

but I'm not sure whether those were the values on entry to the
smi_dr_handle_smp_recv routine that was excised from the code.

 Hal I integrated this patch and checked it back in. I don't think
 Hal this is the solution for all cases (and something else is
 Hal broken).
 
 Could be.  I had a hard time checking the code in smi.c (which is
 split between smi_handle_dr_smp_recv() and smi_handle_dr_smp_send() as
 well as smi_check_forward_dr_smp(), but which has outgoing and
 returning DR handling mixed together) against the IB spec (which
 splits outgoing and returning DR handling).

I had to squint hard the first time I went through this too (and
probably will again). I will explain how this works in sufficient detail
if this is of interest.

 Hal The second call to smi_handle_dr_smp_recv was to validate the
 Hal DR in the response packet before sending it. The response
 Hal would be a returning DR packet (D bit 1). If hop_cnt ==
 Hal hop_ptr,
 
 I guess the problem with calling smi_handle_dr_smp_recv() twice on the
 same packet is that the function may alter the packet.

No, the second call to smi_handle_dr_smp_recv() was on the outgoing
response and not the incoming request. The thought was that a packet
coming from process_mad is much like an incoming received packet and
hence the call to smi_handle_dr_smp_recv. The routine validates the
packet but also can do some fixups depending on which case it falls
into. Guess it's only dangerous to validate this and wrong to fix it up.

The key to me is the following:
The split of responsibility on the DR header formation is a little
unclear to me. In the case of the SM, are the DR headers fully formed
before handing it to the MAD layer or is some DR fixup needed ?

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Roland Dreier
Roland I guess the problem with calling smi_handle_dr_smp_recv()
Roland twice on the same packet is that the function may alter
Roland the packet.

Hal No, the second call to smi_handle_dr_smp_recv() was on the
Hal outgoing response and not the incoming request. The thought
Hal was that a packet coming from process_mad is much like an
Hal incoming received packet and hence the call to
Hal smi_handle_dr_smp_recv. The routine validates the packet but
Hal also can do some fixups depending on which case it falls
Hal into. Guess it's only dangerous to validate this and wrong to
Hal fix it up.

Maybe I'm misreading the code, but my patch deleted the call to
smi_handle_dr_smp_recv() before the call to agent_send.  agent_send()
eventually ends up in ib_post_send_mad(), which calls
handle_outgoing_smp() for directed route MADs, which ends up calling
smi_handle_dr_smp_recv() again.  Since smi_handle_dr_smp_recv() can
change the packet, calling it twice on the same packet seems to break things.

However I don't think it's a good idea to think of responses generated
by process_mad as an incoming received packet.  I think they should be
thought of as returning DR SMPs being passed to the SMI for sending
(as in section 14.2.2 of the IB spec).

Hal The key to me is the following: The split of responsibility
Hal on the DR header formation is a little unclear to me. In the
Hal case of the SM, are the DR headers fully formed before
Hal handing it to the MAD layer or is some DR fixup needed ?

My suggestion would be to follow the IB spec, and assume that the SM
follows the SMP initialization in 14.2.2.1 and have the MAD layer just
implement the SMI processing in 14.2.2.2.  (And I believe things
should work similarly for responses generated by the SMA -- the MAD
layer should just do SMI processing).

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
On Wed, 2004-11-10 at 16:29, Roland Dreier wrote:
 Roland I guess the problem with calling smi_handle_dr_smp_recv()
 Roland twice on the same packet is that the function may alter
 Roland the packet.
 
 Hal No, the second call to smi_handle_dr_smp_recv() was on the
 Hal outgoing response and not the incoming request. The thought
 Hal was that a packet coming from process_mad is much like an
 Hal incoming received packet and hence the call to
 Hal smi_handle_dr_smp_recv. The routine validates the packet but
 Hal also can do some fixups depending on which case it falls
 Hal into. Guess it's only dangerous to validate this and wrong to
 Hal fix it up.
 
 Maybe I'm misreading the code, but my patch deleted the call to
 smi_handle_dr_smp_recv() before the call to agent_send.  

You're not. I was...

 agent_send() eventually ends up in ib_post_send_mad(), which calls
 handle_outgoing_smp() for directed route MADs, which ends up calling
 smi_handle_dr_smp_recv() again.  Since smi_handle_dr_smp_recv() can
 change the packet, calling it twice on the same packet seems to break things.

I'm with you now.

 However I don't think it's a good idea to think of responses generated
 by process_mad as an incoming received packet.  I think they should be
 thought of as returning DR SMPs being passed to the SMI for sending
 (as in section 14.2.2 of the IB spec).

Yup, there is a difference between a returning SMP being sent and an
incoming SMP being received in terms of SMI. I was being imprecise
again.

 Hal The key to me is the following: The split of responsibility
 Hal on the DR header formation is a little unclear to me. In the
 Hal case of the SM, are the DR headers fully formed before
 Hal handing it to the MAD layer or is some DR fixup needed ?
 
 My suggestion would be to follow the IB spec, and assume that the SM
 follows the SMP initialization in 14.2.2.1 and have the MAD layer just
 implement the SMI processing in 14.2.2.2.  (And I believe things
 should work similarly for responses generated by the SMA -- the MAD
 layer should just do SMI processing).

That was the intention. 

I will figure out what is broke but not just yet... I may want something
tried by either you or Sean prior to my checking it in to be sure. I'll
let you know.

Thanks.

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-10 Thread Hal Rosenstock
On Wed, 2004-11-10 at 12:02, Roland Dreier wrote:
 By the way, if I am reading the code correctly, it looks like the MAD
 layer only checks for IB_MAD_RESULT_REPLY and not
 IB_MAD_RESULT_CONSUMED.  

You are reading the code correctly.

 If IB_MAD_RESULT_CONSUMED is set then the
 packet is something like a trap repress handled by the SMA or a
 locally generated trap that the driver forwarded to the SM, so the
 packet should not go through agent dispatch.

This is a patch which should occur shortly.

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-09 Thread Roland Dreier
OK, this works on my i386 system but I'm still getting 

ib_mad: Invalid directed route

on ppc64.  I'll try to debug what exactly is happening (ie put some
prints in to see why smi_handle_dr_smp_send() is rejecting it).

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-09 Thread Roland Dreier
Roland OK, this works on my i386 system but I'm still getting

Roland ib_mad: Invalid directed route

Roland on ppc64.  I'll try to debug what exactly is happening (ie
Roland put some prints in to see why smi_handle_dr_smp_send() is
Roland rejecting it).

By the way, the i386 system is connected directly to the switch
running the SM, while the ppc64 system is a few hops away.  So it's
just as likely to be a DR SMI handling problem as a ppc64 architecture
issue.

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-09 Thread Hal Rosenstock
On Tue, 2004-11-09 at 18:54, Roland Dreier wrote:
 OK, I think I understand the problem, but I'm not sure what the
 correct solution is.  When a DR SMP arrives at a CA from the SM,
 hop_cnt == hop_ptr == number of hops in the directed route,

What was the number ?

 and somehow they are not updated correctly by the time the response
 reaches handle_outgoing_smp().
 
 I can't follow the code well enough to understand why all DR SMPs have
 to go through both smi_handle_dr_smp_recv() and
 smi_handle_dr_smp_send() but the patch below seems to correct things
 for me (ports go to ACTIVE on all my systems).  (handle_outgoing_smp()
 already calls smi_handle_dr_smp_recv() so it seems the response was
 getting passed to smi_handle_dr_smp_recv() twice).

I integrated this patch and checked it back in. I don't think this is
the solution for all cases (and something else is broken).

The second call to smi_handle_dr_smp_recv was to validate the DR in the
response packet before sending it. The response would be a returning DR
packet (D bit 1). If hop_cnt == hop_ptr,   

I suspect this has been broken since r1163 (not including the other
things I broke in it today).

I will do some more work in understanding this tomorrow.

-- Hal

___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

2004-11-09 Thread Roland Dreier
Roland OK, I think I understand the problem, but I'm not sure
Roland what the correct solution is.  When a DR SMP arrives at a
Roland CA from the SM, hop_cnt == hop_ptr == number of hops in
Roland the directed route,

Hal What was the number ?

For one port it was 4 and for another it was 6.  It could really be
anything (it's just how many hops away the SM is).

Hal I integrated this patch and checked it back in. I don't think
Hal this is the solution for all cases (and something else is
Hal broken).

Could be.  I had a hard time checking the code in smi.c (which is
split between smi_handle_dr_smp_recv() and smi_handle_dr_smp_send() as
well as smi_check_forward_dr_smp(), but which has outgoing and
returning DR handling mixed together) against the IB spec (which
splits outgoing and returning DR handling).

Hal The second call to smi_handle_dr_smp_recv was to validate the
Hal DR in the response packet before sending it. The response
Hal would be a returning DR packet (D bit 1). If hop_cnt ==
Hal hop_ptr,

I guess the problem with calling smi_handle_dr_smp_recv() twice on the
same packet is that the function may alter the packet.

 - R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general