Re: [PATCH-v2-resend] iscsistart -b memory corruption
shyam_i...@dell.com wrote: Mike Christie wrote: Mike Christie wrote: We should not have to call stop_event_loop if the event_loop is already stopped. What I am wondering is how the event loop is stopped by anyone other than the call to stop_event_loop? It looks like it could happen if there was a signal or if there was a poll error. Did you see either of those in your test? Even if we did stop_event_loop when event_loop is not running, I am not sure I how this causes a double free or corruption. If event_loop is stopped then will stop_event_loop return an error and that will cause us to send a sigterm to the parent but the parent could have completed? Would that return the same error you are seeing? I think I made a mistake here. In the patch I sent, leave_event_loop could be 0 and I would still return error to the parent. The parent would send a SIGTERM to the child and thus prevent the child from freeing up the conn-context_pool[i] structures through free_initiator(). So, looks like my patch was not the right fix and we are back to square one now. +static mgmt_ipc_err_e +mgmt_ipc_check_eventloop(queue_task_t *qtask) { + if(!leave_event_loop) + return MGMT_IPC_ERR_INTERNAL; + return MGMT_IPC_OK; +} Somehow the conn-context_pool[i] is being freed up before the child calls free_initiator(). This causes the double free. I tried to replicate this here but did not see the problem. Could you maybe run iscsistart in valgrind and send that output? Maybe it is related to this https://bugzilla.redhat.com/show_bug.cgi?id=491363 That ppc code is run first as a sort of probe to see if ppc iscsi boot is being used. If not then we switch and check ibft. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
RE: [PATCH-v2-resend] iscsistart -b memory corruption
Resending because earlier patch had a small typo. Ulrich wrote: On 17 Mar 2009 at 22:33, shyam_i...@dell.com wrote: The issue is caused because child is trying to free up the session and connections that the parent had setup. Hi! Not knowing the source, I wonder: Parent and child (both processes?) have their own copy of memory, so each one can free within its own copy of memory. If they are calling some external process to free something, it's most likely a race condition, because the server should only allow to free anything exactly one. Yes. You are right. There was actually a race condition because of the event loop being stopped twice. Looks like I got a patch with the real scenario affecting the bug. Please note I am marking the attempt by the parent to stop the event loop when it has already stopped as an internal error. I am not sure if we are supposed to log the error but yet this serializing the stopping of event loop will fix the issue of freeing twice. I think we don't have to call stop_event_loop if event_loop is already stopped. Signed-off-by: Shyam Iyer shyam_i...@dell.com Tested-by: Paniraja KM paniraja...@dell.com diff -Naru open-iscsi-2.0-870.3/usr/iscsistart.c open-iscsi-2.0-870.3-fix-race/usr/iscsistart.c --- open-iscsi-2.0-870.3/usr/iscsistart.c 2009-02-23 03:02:46.0 +0530 +++ open-iscsi-2.0-870.3-fix-race/usr/iscsistart.c 2009-03-18 17:40:24.0 +0530 @@ -111,6 +111,15 @@ int rc; memset(req, 0, sizeof(req)); + req.command = MGMT_IPC_CHECK_EVENTLOOP; + rc = do_iscsid(req, rsp); + if (rc) { + iscsid_handle_error(rc); + log_error(event_loop already stopped\n); + return rc; + } + + memset(req, 0, sizeof(req)); req.command = MGMT_IPC_IMMEDIATE_STOP; rc = do_iscsid(req, rsp); if (rc) { diff -Naru open-iscsi-2.0-870.3/usr/mgmt_ipc.c open-iscsi-2.0-870.3-fix-race/usr/mgmt_ipc.c --- open-iscsi-2.0-870.3/usr/mgmt_ipc.c 2009-02-23 03:02:46.0 +0530 +++ open-iscsi-2.0-870.3-fix-race/usr/mgmt_ipc.c2009-03-18 17:35:02.0 +0530 @@ -339,6 +339,13 @@ { return mgmt_ipc_notify_common(qtask, iscsi_discovery_del_portal); } +static mgmt_ipc_err_e +mgmt_ipc_check_eventloop(queue_task_t *qtask) +{ + if(!leave_event_loop) + return MGMT_IPC_ERR_INTERNAL; + return MGMT_IPC_OK; +} static int mgmt_peeruser(int sock, char *user) @@ -526,6 +533,7 @@ [MGMT_IPC_NOTIFY_DEL_NODE] = mgmt_ipc_notify_del_node, [MGMT_IPC_NOTIFY_ADD_PORTAL] = mgmt_ipc_notify_add_portal, [MGMT_IPC_NOTIFY_DEL_PORTAL] = mgmt_ipc_notify_del_portal, +[MGMT_IPC_CHECK_EVENTLOOP] = mgmt_ipc_check_eventloop, }; static void diff -Naru open-iscsi-2.0-870.3/usr/mgmt_ipc.h open-iscsi-2.0-870.3-fix-race/usr/mgmt_ipc.h --- open-iscsi-2.0-870.3/usr/mgmt_ipc.h 2009-02-23 03:02:46.0 +0530 +++ open-iscsi-2.0-870.3-fix-race/usr/mgmt_ipc.h2009-03-18 17:35:07.0 +0530 @@ -70,6 +70,7 @@ MGMT_IPC_NOTIFY_DEL_NODE= 18, MGMT_IPC_NOTIFY_ADD_PORTAL = 19, MGMT_IPC_NOTIFY_DEL_PORTAL = 20, + MGMT_IPC_CHECK_EVENTLOOP= 21, __MGMT_IPC_MAX_COMMAND } iscsiadm_cmd_e; --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- fix_event_loop_race Description: fix_event_loop_race
Re: [PATCH-v2-resend] iscsistart -b memory corruption
Thanks for the patch. shyam_i...@dell.com wrote: Resending because earlier patch had a small typo. Ulrich wrote: On 17 Mar 2009 at 22:33, shyam_i...@dell.com wrote: The issue is caused because child is trying to free up the session and connections that the parent had setup. Hi! Not knowing the source, I wonder: Parent and child (both processes?) have their own copy of memory, so each one can free within its own copy of memory. If they are calling some external process to free something, it's most likely a race condition, because the server should only allow to free anything exactly one. Yes. You are right. There was actually a race condition because of the event loop being stopped twice. Looks like I got a patch with the real scenario affecting the bug. Please note I am marking the attempt by the parent to stop the event loop when it has already stopped as an internal error. I am not sure if we are supposed to log the error but yet this serializing the stopping of event loop will fix the issue of freeing twice. I think we don't have to call stop_event_loop if event_loop is already stopped. We should not have to call stop_event_loop if the event_loop is already stopped. What I am wondering is how the event loop is stopped by anyone other than the call to stop_event_loop? It looks like it could happen if there was a signal or if there was a poll error. Did you see either of those in your test? Even if we did stop_event_loop when event_loop is not running, I am not sure I how this causes a double free or corruption. If event_loop is stopped then will stop_event_loop return an error and that will cause us to send a sigterm to the parent but the parent could have completed? Would that return the same error you are seeing? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: [PATCH-v2-resend] iscsistart -b memory corruption
Mike Christie wrote: Thanks for the patch. shyam_i...@dell.com wrote: Resending because earlier patch had a small typo. Ulrich wrote: On 17 Mar 2009 at 22:33, shyam_i...@dell.com wrote: The issue is caused because child is trying to free up the session and connections that the parent had setup. Hi! Not knowing the source, I wonder: Parent and child (both processes?) have their own copy of memory, so each one can free within its own copy of memory. If they are calling some external process to free something, it's most likely a race condition, because the server should only allow to free anything exactly one. Yes. You are right. There was actually a race condition because of the event loop being stopped twice. Looks like I got a patch with the real scenario affecting the bug. Please note I am marking the attempt by the parent to stop the event loop when it has already stopped as an internal error. I am not sure if we are supposed to log the error but yet this serializing the stopping of event loop will fix the issue of freeing twice. I think we don't have to call stop_event_loop if event_loop is already stopped. We should not have to call stop_event_loop if the event_loop is already stopped. What I am wondering is how the event loop is stopped by anyone other than the call to stop_event_loop? It looks like it could happen if there was a signal or if there was a poll error. Did you see either of those in your test? Even if we did stop_event_loop when event_loop is not running, I am not sure I how this causes a double free or corruption. If event_loop is stopped then will stop_event_loop return an error and that will cause us to send a sigterm to the parent but the parent could have completed? Would that return the same error you are seeing? I tried to replicate this here but did not see the problem. Could you maybe run iscsistart in valgrind and send that output? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---