[Pvfs2-developers] patches: misc. bug fixes

Phil Carns Thu, 10 Aug 2006 11:30:05 -0700

bmi-socket-close.patch:
-----------------------

This fixes a bug in the new BMI_set_info(... BMI_TCP_CLOSE_SOCKET ...)mechanism, which is used to reconnect the socket to the initialconfiguration server if new socket buffer sizes are specified in theconfig file. I didn't follow the code path find the exact problem, butat a high level it wasn't being thorough enough in cleaning out the oldsocket. This showed up when using epoll and specifying socket buffersizes in the server configuration- in this case the client will oftenfail to mount with a cryptic "not a directory" error and leave someepoll() errors in the pvfs2-client.log file. I think a stale (orpossibly reused) file descriptor was being left in the epoll fd set. Atany rate, the fix is to use a different set of functions for tearingdown the entire address etc. so that it is reconnected on the next BMIaddr lookup. This path is already used by the server to discard old BMIaddresses after critical errors on addresses that cannot be reconnected.It is triggered from bmi.c without entering the bmi_tcp module, sothis patch also adds a check to make sure we don't bother for non-tcpmethods.


bmi-test-overflow.patch:
------------------------

One of the bmi bandwidth test programs was using types that mightoverflow if testing large enough transfers. The fix is to convert todoubles and drop in several type casts to be cautious when performingthe computation that was causing trouble.


cancel-bugs.patch:
-------------------------

The biggest fix here is a change to the job timer code. It wasperforming some pointer operations in the wrong order, which could leadto job timers failing to trigger in some cases. This would prevent someoperations from ever timing out. A secondary fix is a minor cleanup inBMI to catch potential race conditions in cancellation where a lockwasn't being held while checking to see if the target operation is complete.


flow-post-error.patch:
-------------------------

This patch adds checks in the client side I/O state machine to test forfailure at post time for flow operations. This type of error isuncommon unless the flow parameters are faulty, but it should havechecked anyway to be safe.


ndfile-config-check.patch:
--------------------------

This is a safety test. The problem here is that there was no boundschecking for the DefaultNumDFiles option in the config file. This madeit possible to select -1 (which in PVFS1 meant "use the defaultnumber"). In PVFS2 this number gets passed verbatim to the client andwould cause malloc failures and various other odd results when used.The patch just checks at parse time to make sure the value isn't negative.


-Phil

Index: pvfs2_src/src/io/bmi/bmi.c
===================================================================
--- pvfs2_src/src/io/bmi/bmi.c	(revision 2347)
+++ pvfs2_src/src/io/bmi/bmi.c	(revision 2348)
@@ -1179,15 +1179,22 @@
 	return(0);
     }
 
-    /*
-     * Pass the TCP address structure as the parameter for this operation,
-     * holding the lock.
-     */
-    if (option == BMI_TCP_CLOSE_SOCKET) {
-        inout_parameter = tmp_ref->method_addr;
-        ret = tmp_ref->interface->BMI_meth_set_info(option, inout_parameter);
+    /* if the caller requests a TCP specific close socket action */
+    if (option == BMI_TCP_CLOSE_SOCKET) 
+    {
+        /* check to see if the address is in fact a tcp address */
+        if(strcmp(tmp_ref->interface->method_name, "bmi_tcp") == 0)
+        {
+            /* take the same action as in the BMI_DEC_ADDR_REF case to clean
+             * out the entire address structure and anything linked to it so 
+             * that the next addr_lookup starts from scratch
+             */
+	    gossip_debug(GOSSIP_BMI_DEBUG_CONTROL, "Closing bmi_tcp connection at caller's request.\n"); 
+            ref_list_rem(cur_ref_list, addr);
+            dealloc_ref_st(tmp_ref);
+        }
         gen_mutex_unlock(&ref_mutex);
-        return ret;
+        return 0;
     }
 
     gen_mutex_unlock(&ref_mutex);
Index: pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c
===================================================================
--- pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2347)
+++ pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2348)
@@ -677,19 +677,10 @@
             tcp_method_params.listen_addr->method_data)->socket);
 #endif
        break;
-    /*
-     * Used after changing buffer sizes to force the connection to
-     * the server used for the config to close so it can be reopened
-     * with the new buffer sizes as the rest of the future server
-     * connections will be.
-     */
-    case BMI_TCP_CLOSE_SOCKET: {
-        struct method_addr *map = inout_parameter;
-        if (tcp_socket_collection_p)
-            BMI_socket_collection_remove(tcp_socket_collection_p, map);
-        ret = tcp_shutdown_addr(map);
+    case BMI_TCP_CLOSE_SOCKET: 
+        /* this should no longer make it to the bmi_tcp method; see bmi.c */
+        ret = 0;
         break;
-    }
     case BMI_FORCEFUL_CANCEL_MODE:
 	forceful_cancel_mode = 1;
 	ret = 0;

Index: pvfs2_src/test/io/bmi/driver-bw-multi.c
===================================================================
--- pvfs2_src/test/io/bmi/driver-bw-multi.c	(revision 2290)
+++ pvfs2_src/test/io/bmi/driver-bw-multi.c	(revision 2291)
@@ -75,7 +75,7 @@
     double stddev_bmi_time, stddev_mpi_time;
     double agg_bmi_bw, agg_mpi_bw;
     double ave_bmi_time, ave_mpi_time;
-    int total_data_xfer = 0;
+    double total_data_xfer = 0;
     bmi_context_id context = -1;
 
     /* start up benchmark environment */
@@ -315,8 +315,8 @@
      */
     if (world_rank == 0)
     {
-	total_data_xfer = opts.num_servers * num_clients * num_messages *
-	    opts.message_len;
+	total_data_xfer = (double)opts.num_servers * (double)num_clients *
+            (double)num_messages * (double)opts.message_len;
 	if (opts.num_servers > 1)
 	{
 	    var_bmi_time = sumsq_bmi_time -
@@ -352,8 +352,8 @@
 
     if (world_rank == opts.num_servers)
     {
-	total_data_xfer = opts.num_servers * num_clients * num_messages *
-	    opts.message_len;
+	total_data_xfer = (double)opts.num_servers * (double)num_clients *
+            (double)num_messages * (double)opts.message_len;
 
 	if (num_clients > 1)
 	{

Index: pvfs2_src/src/io/bmi/bmi.c
===================================================================
--- pvfs2_src/src/io/bmi/bmi.c	(revision 2089)
+++ pvfs2_src/src/io/bmi/bmi.c	(revision 2090)
@@ -1616,6 +1616,14 @@
                  "%s: cancel id %llu\n", __func__, llu(id));
 
     target_op = id_gen_safe_lookup(id);
+    if(target_op == NULL)
+    {
+        /* if we can't find the operation, then assume it has already
+         * completed naturally.
+         */
+        return(0);
+    }
+
     assert(target_op->op_id == id);
 
     if(active_method_table[target_op->addr->method_type]->BMI_meth_cancel)
Index: pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c
===================================================================
--- pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2089)
+++ pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2090)
@@ -1451,12 +1451,20 @@
  */
 int BMI_tcp_cancel(bmi_op_id_t id, bmi_context_id context_id)
 {
-    method_op_p query_op = (method_op_p)id_gen_safe_lookup(id);
+    method_op_p query_op = NULL;
+    
+    gen_mutex_lock(&interface_mutex);
 
-    assert(query_op);
+    query_op = (method_op_p)id_gen_safe_lookup(id);
+    if(!query_op)
+    {
+        /* if we can't find the operattion, then assume that it has already
+         * completed naturally
+         */
+        gen_mutex_unlock(&interface_mutex);
+        return(0);
+    }
 
-    gen_mutex_lock(&interface_mutex);
-
     /* easy case: is the operation already completed? */
     if(((struct tcp_op*)(query_op->method_data))->tcp_op_state ==
 	BMI_TCP_COMPLETE)
Index: pvfs2_src/src/io/job/job-time-mgr.c
===================================================================
--- pvfs2_src/src/io/job/job-time-mgr.c	(revision 2089)
+++ pvfs2_src/src/io/job/job-time-mgr.c	(revision 2090)
@@ -118,8 +118,8 @@
             break;
         }
 
+        prev_bucket = tmp_bucket;
         tmp_bucket = NULL;
-        prev_bucket = tmp_bucket;
     }
 
     if(!tmp_bucket || tmp_bucket->expire_time_sec != expire_time_sec)
@@ -257,6 +257,7 @@
 	    case JOB_BMI:
 		gossip_err("Job time out: cancelling bmi operation, job_id: %llu.\n", llu(jd->job_id));
 		ret = job_bmi_cancel(jd->job_id, jd->context_id);
+	        jd->time_bucket = NULL;
 		break;
 	    case JOB_FLOW:
 		/* have we made any progress since last time we checked? */
@@ -274,22 +275,23 @@
 		    /* otherwise kill the flow */
 		    gossip_err("Job time out: cancelling flow operation, job_id: %llu.\n", llu(jd->job_id));
 		    ret = job_flow_cancel(jd->job_id, jd->context_id);
+	            jd->time_bucket = NULL;
 		}
 		break;
 	    case JOB_TROVE:
 		gossip_err("Job time out: cancelling trove operation, job_id: %llu.\n", llu(jd->job_id));
 		ret = job_trove_dspace_cancel(
                     jd->u.trove.fsid, jd->job_id, jd->context_id);
+	        jd->time_bucket = NULL;
                 break;
 	    default:
 		ret = 0;
+	        jd->time_bucket = NULL;
 		break;
 	    }
 
 	    /* FIXME: error handling */
 	    assert(ret == 0);
-
-	    jd->time_bucket = NULL;
 	}
 	qlist_del(&tmp_bucket->bucket_link);
         INIT_QLIST_HEAD(&tmp_bucket->jd_queue);

Index: pvfs2_src/src/client/sysint/sys-io.sm
===================================================================
--- pvfs2_src/src/client/sysint/sys-io.sm	(revision 2067)
+++ pvfs2_src/src/client/sysint/sys-io.sm	(revision 2068)
@@ -1048,12 +1048,30 @@
         sm_p->u.io.flow_completion_count--;
         assert(sm_p->u.io.flow_completion_count > -1);
 
-        /* if error, restart; but if this is a write, let write ack catch */
+        /* look for flow error when no write ack is in progress (usually a
+         * read case) 
+         */
         if (js_p->error_code < 0 && !cur_ctx->write_ack_in_progress) {
-            gossip_debug(GOSSIP_IO_DEBUG,
-              "%s: flow failed, retrying from msgpair\n", __func__);
-            cur_ctx->msg_send_has_been_posted = 0;
-            cur_ctx->msg_recv_has_been_posted = 0;
+            if ((PVFS_ERROR_CLASS(-js_p->error_code) == PVFS_ERROR_BMI) ||
+                 (PVFS_ERROR_CLASS(-js_p->error_code) == PVFS_ERROR_FLOW) ||
+                 (js_p->error_code == -ECONNRESET) || 
+                 (js_p->error_code == -PVFS_EPROTO))
+            {
+                /* if this is a an error that we can retry */
+                gossip_debug(GOSSIP_IO_DEBUG,
+                   "%s: flow failed, retrying from msgpair\n", __func__);
+                cur_ctx->msg_send_has_been_posted = 0;
+                cur_ctx->msg_recv_has_been_posted = 0;
+            }
+            else
+            {
+                /* do not retry on remaining error codes */
+                gossip_debug(GOSSIP_IO_DEBUG,
+                   "%s: flow failed, not retrying\n", __func__);
+
+                /* forcing the count high insures that the sm won't restart */
+                sm_p->u.io.retry_count = sm_p->msgarray_params.retry_limit;
+            }
         }
     }
     else if (STATUS_USER_TAG_TYPE(status_user_tag, IO_SM_PHASE_FINAL_ACK))
@@ -1741,27 +1759,47 @@
         server_config->client_job_flow_timeout);
     PINT_put_server_config_struct(server_config);
 
-    if (ret < 0)
+    /* if the flow fails immediately, then we have to do some special
+     * handling.  This function is not equiped to handle the failure
+     * directly, so we instead post a null job that will propigate the error
+     * to the normal state where we interpret flow errors
+     */
+    if((ret < 0) || (ret == 1 && cur_ctx->flow_status.error_code != 0))
     {
-        gossip_err("job_flow failed\n");
-        return ret;
+        /* make sure the error code is stored in the flow descriptor */
+        if(ret == 1)
+        {
+            cur_ctx->flow_desc.error_code = cur_ctx->flow_status.error_code;
+        }
+        else
+        {
+            cur_ctx->flow_desc.error_code = ret;
+        }
+
+        gossip_debug(GOSSIP_IO_DEBUG, "  immediate flow failure for "
+                     "context %p, error code: %d\n", cur_ctx,
+                     cur_ctx->flow_desc.error_code);
+        gossip_debug(GOSSIP_IO_DEBUG, "  posting job_null() to propigate.\n");
+
+        /* post a fake job to propigate the flow failure to a later state */
+        ret = job_null(cur_ctx->flow_desc.error_code, sm_p, 
+            status_user_tag, &cur_ctx->flow_status,
+            &cur_ctx->flow_job_id, pint_client_sm_context);
+        if(ret !=0)
+        {
+            return(ret);
+        }
     }
-    else if (ret == 1)
-    {
-        gossip_debug(GOSSIP_IO_DEBUG, "  flow for context %p "
-                     "completed immediately\n", cur_ctx);
-        assert(cur_ctx->flow_status.error_code == 0);
-    }
     else
     {
         gossip_debug(GOSSIP_IO_DEBUG, "  posted flow for "
                      "context %p\n", cur_ctx);
-
-        cur_ctx->flow_has_been_posted = 1;
-        cur_ctx->flow_in_progress = 1;
-        sm_p->u.io.flow_completion_count++;
     }
 
+    cur_ctx->flow_has_been_posted = 1;
+    cur_ctx->flow_in_progress = 1;
+    sm_p->u.io.flow_completion_count++;
+
     return 0;
 }

Index: pvfs2_src/src/common/misc/server-config.c
===================================================================
--- pvfs2_src/src/common/misc/server-config.c	(revision 2273)
+++ pvfs2_src/src/common/misc/server-config.c	(revision 2274)
@@ -1871,6 +1871,10 @@
         PINT_llist_head(config_s->file_systems);
 
     fs_conf->default_num_dfiles = (int)cmd->data.value;
+    if(fs_conf->default_num_dfiles < 0)
+    {
+        return("Error DefaultNumDFiles must be positive.\n");
+    }
     return NULL;
 }

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] patches: misc. bug fixes

Reply via email to