Whoops. Thanks for your patience Bart. Can you try one more time with this additional patch applied? If that fails I'll set up something here to try to reproduce it first hand.

thanks,
-Phil

On 05/13/2010 04:32 PM, Bart Taylor wrote:
Correction, I did get a core file this time. I just overlooked it. Backtrace below.

Bart.


(gdb) bt
#0  0x008247a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00865825 in raise () from /lib/tls/libc.so.6
#2  0x00867289 in abort () from /lib/tls/libc.so.6
#3  0x00899d2a in __libc_message () from /lib/tls/libc.so.6
#4  0x008a072f in _int_free () from /lib/tls/libc.so.6
#5  0x008a0baa in free () from /lib/tls/libc.so.6
#6 0x0808b73d in precreate_pool_get_thread_mgr_callback_unlocked (data=0x9f31188,
    error_code=0) at ../pvfs2_src/src/io/job/job.c:4460
#7  0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9f68ca0)
    at ../pvfs2_src/src/io/job/job.c:5935
#8 0x0808d623 in job_precreate_pool_get_handles (fsid=1141984428, count=2, servers=0x0, handle_array=0x9f35a20, flags=0, user_ptr=0x9f512f0, status_user_tag=0,
    out_status_p=0x9f0c348, id=0xbfebcc00, context_id=0, hints=0x9f4df58)
    at ../pvfs2_src/src/io/job/job.c:5723
#9  0x080c2ad8 in get_handles (smcb=0x9f512f0, js_p=0x9f0c348)
    at ../pvfs2_src/src/server/unstuff.sm:267 <http://unstuff.sm:267>
#10 0x0807630a in PINT_state_machine_invoke (smcb=0x9f512f0, r=0x9f0c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
#11 0x080766c8 in PINT_state_machine_next (smcb=0x9f512f0, r=0x9f0c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
#12 0x08076704 in PINT_state_machine_continue (smcb=0x9f512f0, r=0x9f0c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
#13 0x0805667c in main (argc=6, argv=0xbfebcd84) at ../pvfs2_src/src/server/pvfs2-server.c:413




On Thu, May 13, 2010 at 2:18 PM, Bart Taylor <[email protected] <mailto:[email protected]>> wrote:

    Hey Phil,

    Unfortunately, I didn't have any luck with the patch. I didn't get
    a core file this time, but one of the daemons quit responding. I
    was able to run a ping and statfs again, but as soon as I tried to
    write that file, the server stalled. What other information can I
    get you?

    Bart.




    On Thu, May 13, 2010 at 12:14 PM, Phil Carns <[email protected]
    <mailto:[email protected]>> wrote:

        Hey Bart,

        I haven't really tested this change yet, but can you try the
        attached patch and see if that seems to solve the problem?  I
        think this is follow on to the same bug you guys reported
        earlier.   I just missed another race issue caused by the last
        patch.

        -Phil


        On 05/12/2010 05:18 PM, Bart Taylor wrote:
        Hey guys,

        I have a 3 node local disk file system that had a core dump
        during some testing. It is an upgraded fs from 2.6 to 2.8.2.
        After the upgrade, I ran a couple of utilities like
        pvfs2-ping and pvfs2-statfs. After those succeeded, I
        attempted to create a new file of around 800K, and the first
        server died. There wasn't anything useful in the logs or
        dmesg. Below is a backtrace from the core file. I can supply
        the entire file, but I can't email it at 43M.

        This may be related to the precreate-pool-race patch from a
        few days ago since the backtrace indicates it was in the
        vicinity of those code changes.

        Let me know what else I can supply that will help.

        Bart.




        (gdb) bt
        #0  0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
        #1  0x00a24825 in raise () from /lib/tls/libc.so.6
        #2  0x00a26289 in abort () from /lib/tls/libc.so.6
        #3  0x00a58d2a in __libc_message () from /lib/tls/libc.so.6
        #4  0x00a5f72f in _int_free () from /lib/tls/libc.so.6
        #5  0x00a5fbaa in free () from /lib/tls/libc.so.6
        #6  0x0807d6e5 in
        precreate_pool_get_thread_mgr_callback_unlocked
        (data=0xb55d30f0, error_code=0) at
        ../pvfs2_src/src/io/job/job.c:4456
        #7  0x0807fd3d in precreate_pool_get_handles_try_post
        (jd=0xb55d4110) at ../pvfs2_src/src/io/job/job.c:5930
        #8  0x0807f5b9 in job_precreate_pool_get_handles
        (fsid=140299291, count=2, servers=0x0,
        handle_array=0xb55d41f0, flags=0, user_ptr=0xb5507c98,
            status_user_tag=0, out_status_p=0x9c23348, id=0xbffc11b0,
        context_id=0, hints=0xb5506a88) at
        ../pvfs2_src/src/io/job/job.c:5718
        #9  0x0806c3cc in get_handles (smcb=0xb5507c98,
        js_p=0x9c23348) at ../pvfs2_src/src/server/unstuff.sm:267
        <http://unstuff.sm:267>
        #10 0x08095e06 in PINT_state_machine_invoke (smcb=0xb5507c98,
        r=0x9c23348) at
        ../pvfs2_src/src/common/misc/state-machine-fns.c:132
        #11 0x080961c4 in PINT_state_machine_next (smcb=0xb5507c98,
        r=0x9c23348) at
        ../pvfs2_src/src/common/misc/state-machine-fns.c:309
        #12 0x08096200 in PINT_state_machine_continue
        (smcb=0xb5507c98, r=0x9c23348) at
        ../pvfs2_src/src/common/misc/state-machine-fns.c:327
        #13 0x0805667c in main (argc=6, argv=0xbffc1334) at
        ../pvfs2_src/src/server/pvfs2-server.c:413


        _______________________________________________
        Pvfs2-developers mailing list
        [email protected]
        <mailto:[email protected]>
        http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



        _______________________________________________
        Pvfs2-developers mailing list
        [email protected]
        <mailto:[email protected]>
        http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers




? log.txt
? pvfs2
Index: src/io/job/job.c
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/io/job/job.c,v
retrieving revision 1.192
diff -a -u -p -r1.192 job.c
--- src/io/job/job.c	13 May 2010 18:10:06 -0000	1.192
+++ src/io/job/job.c	13 May 2010 21:11:10 -0000
@@ -5865,6 +5865,9 @@ static void precreate_pool_get_handles_t
             = &tmp_trove_array[i];
     }
 
+    /* pre-increment pending count before posting any trove operations */
+    jd->u.precreate_pool.trove_pending = jd->u.precreate_pool.precreate_handle_count;
+
     /* post all trove operations at once */
     for(i=0; i<jd->u.precreate_pool.precreate_handle_count; i++)
     { 
@@ -5905,9 +5908,6 @@ static void precreate_pool_get_handles_t
             }
         }
 
-        /* pre-increment pending count before posting trove operation */
-        trove_pending_count++;
-        jd->u.precreate_pool.trove_pending++;
 
         /* post trove operation to pull out a handle */
         ret = trove_keyval_iterate_keys(
@@ -5937,6 +5937,7 @@ static void precreate_pool_get_handles_t
         }
         else
         {
+            trove_pending_count++;
             /* callback will be triggered later */
         }
     }
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to