Shirley> It's necessary to modify the ib_mad, ib_sa, ib_cm, just
    Shirley> act like ib_ipoib and ib_cache to continue initializing
    Shirley> when one port encounting errors, instead of releasing all
    Shirley> resouces. If you agree, I am creating as the first patch
    Shirley> for review. How to handler the errors would be the second
    Shirley> patch.

I don't agree that we want to handle "half-usable" devices where some
ports don't work.  The only use for this seems to be working around
some problems with the current Galaxy HCA implementation, and there
must be a better way to handle this.

You're welcome to prove me wrong, but I think that handling ports that
are not usable and then become usable later is just going to be
horrible.  And if we do that, then I think it would make sense to
handle ports starting out usable and then becoming unusable later --
and I think that's going to be even worse still.

I do agree that we want to handle errors in initialization better.
The ib_mad and ib_cm code actually looks OK to me (with a small bug in
ib_mad for which I'll post a patch shortly).  I think something like
the patch below is all that's needed to fix ib_sa:

--- infiniband/core/sa_query.c  (revision 3664)
+++ infiniband/core/sa_query.c  (working copy)
@@ -583,10 +583,16 @@ int ib_sa_path_rec_get(struct ib_device 
 {
        struct ib_sa_path_query *query;
        struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-       struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-       struct ib_mad_agent *agent  = port->agent;
+       struct ib_sa_port   *port;
+       struct ib_mad_agent *agent;
        int ret;
 
+       if (!sa_dev)
+               return -ENODEV;
+
+       port  = &sa_dev->port[port_num - sa_dev->start_port];
+       agent = port->agent;
+
        query = kmalloc(sizeof *query, gfp_mask);
        if (!query)
                return -ENOMEM;
@@ -685,10 +691,16 @@ int ib_sa_service_rec_query(struct ib_de
 {
        struct ib_sa_service_query *query;
        struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-       struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-       struct ib_mad_agent *agent  = port->agent;
+       struct ib_sa_port   *port;
+       struct ib_mad_agent *agent;
        int ret;
 
+       if (!sa_dev)
+               return -ENODEV;
+
+       port  = &sa_dev->port[port_num - sa_dev->start_port];
+       agent = port->agent;
+
        if (method != IB_MGMT_METHOD_GET &&
            method != IB_MGMT_METHOD_SET &&
            method != IB_SA_METHOD_DELETE)
@@ -768,10 +780,16 @@ int ib_sa_mcmember_rec_query(struct ib_d
 {
        struct ib_sa_mcmember_query *query;
        struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-       struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-       struct ib_mad_agent *agent  = port->agent;
+       struct ib_sa_port   *port;
+       struct ib_mad_agent *agent;
        int ret;
 
+       if (!sa_dev)
+               return -ENODEV;
+
+       port  = &sa_dev->port[port_num - sa_dev->start_port];
+       agent = port->agent;
+
        query = kmalloc(sizeof *query, gfp_mask);
        if (!query)
                return -ENOMEM;
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to