Thanks for trying it again Bart. I'll get a test setup going before I mess with it any further so I can make sure of what's going on. It sounds like the problem should be easy to reproduce.

-Phil

On 05/14/2010 10:10 AM, Bart Taylor wrote:
I got the same result with this patch. The backtrace is still basically the same, but since it is just a little different, I included it.

Bart.


(gdb) bt
#0  0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a24825 in raise () from /lib/tls/libc.so.6
#2  0x00a26289 in abort () from /lib/tls/libc.so.6
#3  0x00a1dda1 in __assert_fail () from /lib/tls/libc.so.6
#4 0x0808b680 in precreate_pool_get_thread_mgr_callback_unlocked (data=0x97719a8,
    error_code=0) at ../pvfs2_src/src/io/job/job.c:4429
#5  0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9771138)
    at ../pvfs2_src/src/io/job/job.c:5934
#6 0x0808d623 in job_precreate_pool_get_handles (fsid=1664005450, count=2, servers=0x0, handle_array=0x9751658, flags=0, user_ptr=0x9755d98, status_user_tag=0,
    out_status_p=0x972c348, id=0xbfeb71d0, context_id=0, hints=0x9759ed8)
    at ../pvfs2_src/src/io/job/job.c:5723
#7  0x080c2ae0 in get_handles (smcb=0x9755d98, js_p=0x972c348)
    at ../pvfs2_src/src/server/unstuff.sm:267 <http://unstuff.sm:267>
#8  0x0807630a in PINT_state_machine_invoke (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
#9  0x080766c8 in PINT_state_machine_next (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
#10 0x08076704 in PINT_state_machine_continue (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
#11 0x0805667c in main (argc=6, argv=0xbfeb7354) at ../pvfs2_src/src/server/pvfs2-server.c:413




On Thu, May 13, 2010 at 3:13 PM, Phil Carns <[email protected] <mailto:[email protected]>> wrote:

    Whoops.  Thanks for your patience Bart.  Can you try one more time
    with this additional patch applied?  If that fails I'll set up
    something here to try to reproduce it first hand.

    thanks,
    -Phil


    On 05/13/2010 04:32 PM, Bart Taylor wrote:
    Correction, I did get a core file this time. I just overlooked
    it. Backtrace below.

    Bart.


    (gdb) bt
    #0  0x008247a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
    #1  0x00865825 in raise () from /lib/tls/libc.so.6
    #2  0x00867289 in abort () from /lib/tls/libc.so.6
    #3  0x00899d2a in __libc_message () from /lib/tls/libc.so.6
    #4  0x008a072f in _int_free () from /lib/tls/libc.so.6
    #5  0x008a0baa in free () from /lib/tls/libc.so.6
    #6  0x0808b73d in precreate_pool_get_thread_mgr_callback_unlocked
    (data=0x9f31188,
        error_code=0) at ../pvfs2_src/src/io/job/job.c:4460
    #7  0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9f68ca0)
        at ../pvfs2_src/src/io/job/job.c:5935
    #8  0x0808d623 in job_precreate_pool_get_handles
    (fsid=1141984428, count=2, servers=0x0,
        handle_array=0x9f35a20, flags=0, user_ptr=0x9f512f0,
    status_user_tag=0,
        out_status_p=0x9f0c348, id=0xbfebcc00, context_id=0,
    hints=0x9f4df58)
        at ../pvfs2_src/src/io/job/job.c:5723
    #9  0x080c2ad8 in get_handles (smcb=0x9f512f0, js_p=0x9f0c348)
        at ../pvfs2_src/src/server/unstuff.sm:267 <http://unstuff.sm:267>
    #10 0x0807630a in PINT_state_machine_invoke (smcb=0x9f512f0,
    r=0x9f0c348)
        at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
    #11 0x080766c8 in PINT_state_machine_next (smcb=0x9f512f0,
    r=0x9f0c348)
        at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
    #12 0x08076704 in PINT_state_machine_continue (smcb=0x9f512f0,
    r=0x9f0c348)
        at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
    #13 0x0805667c in main (argc=6, argv=0xbfebcd84) at
    ../pvfs2_src/src/server/pvfs2-server.c:413




    On Thu, May 13, 2010 at 2:18 PM, Bart Taylor <[email protected]
    <mailto:[email protected]>> wrote:

        Hey Phil,

        Unfortunately, I didn't have any luck with the patch. I
        didn't get a core file this time, but one of the daemons quit
        responding. I was able to run a ping and statfs again, but as
        soon as I tried to write that file, the server stalled. What
        other information can I get you?

        Bart.




        On Thu, May 13, 2010 at 12:14 PM, Phil Carns
        <[email protected] <mailto:[email protected]>> wrote:

            Hey Bart,

            I haven't really tested this change yet, but can you try
            the attached patch and see if that seems to solve the
            problem?  I think this is follow on to the same bug you
            guys reported earlier.   I just missed another race issue
            caused by the last patch.

            -Phil


            On 05/12/2010 05:18 PM, Bart Taylor wrote:
            Hey guys,

            I have a 3 node local disk file system that had a core
            dump during some testing. It is an upgraded fs from 2.6
            to 2.8.2. After the upgrade, I ran a couple of utilities
            like pvfs2-ping and pvfs2-statfs. After those succeeded,
            I attempted to create a new file of around 800K, and the
            first server died. There wasn't anything useful in the
            logs or dmesg. Below is a backtrace from the core file.
            I can supply the entire file, but I can't email it at 43M.

            This may be related to the precreate-pool-race patch
            from a few days ago since the backtrace indicates it was
            in the vicinity of those code changes.

            Let me know what else I can supply that will help.

            Bart.




            (gdb) bt
            #0  0x009e37a2 in _dl_sysinfo_int80 () from
            /lib/ld-linux.so.2
            #1  0x00a24825 in raise () from /lib/tls/libc.so.6
            #2  0x00a26289 in abort () from /lib/tls/libc.so.6
            #3  0x00a58d2a in __libc_message () from /lib/tls/libc.so.6
            #4  0x00a5f72f in _int_free () from /lib/tls/libc.so.6
            #5  0x00a5fbaa in free () from /lib/tls/libc.so.6
            #6  0x0807d6e5 in
            precreate_pool_get_thread_mgr_callback_unlocked
            (data=0xb55d30f0, error_code=0) at
            ../pvfs2_src/src/io/job/job.c:4456
            #7  0x0807fd3d in precreate_pool_get_handles_try_post
            (jd=0xb55d4110) at ../pvfs2_src/src/io/job/job.c:5930
            #8  0x0807f5b9 in job_precreate_pool_get_handles
            (fsid=140299291, count=2, servers=0x0,
            handle_array=0xb55d41f0, flags=0, user_ptr=0xb5507c98,
                status_user_tag=0, out_status_p=0x9c23348,
            id=0xbffc11b0, context_id=0, hints=0xb5506a88) at
            ../pvfs2_src/src/io/job/job.c:5718
            #9  0x0806c3cc in get_handles (smcb=0xb5507c98,
            js_p=0x9c23348) at
            ../pvfs2_src/src/server/unstuff.sm:267
            <http://unstuff.sm:267>
            #10 0x08095e06 in PINT_state_machine_invoke
            (smcb=0xb5507c98, r=0x9c23348) at
            ../pvfs2_src/src/common/misc/state-machine-fns.c:132
            #11 0x080961c4 in PINT_state_machine_next
            (smcb=0xb5507c98, r=0x9c23348) at
            ../pvfs2_src/src/common/misc/state-machine-fns.c:309
            #12 0x08096200 in PINT_state_machine_continue
            (smcb=0xb5507c98, r=0x9c23348) at
            ../pvfs2_src/src/common/misc/state-machine-fns.c:327
            #13 0x0805667c in main (argc=6, argv=0xbffc1334) at
            ../pvfs2_src/src/server/pvfs2-server.c:413


            _______________________________________________
            Pvfs2-developers mailing list
            [email protected]
            <mailto:[email protected]>
            http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



            _______________________________________________
            Pvfs2-developers mailing list
            [email protected]
            <mailto:[email protected]>
            http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers






_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to