I got the same result with this patch. The backtrace is still basically the
same, but since it is just a little different, I included it.

Bart.


(gdb) bt
#0  0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a24825 in raise () from /lib/tls/libc.so.6
#2  0x00a26289 in abort () from /lib/tls/libc.so.6
#3  0x00a1dda1 in __assert_fail () from /lib/tls/libc.so.6
#4  0x0808b680 in precreate_pool_get_thread_mgr_callback_unlocked
(data=0x97719a8,
    error_code=0) at ../pvfs2_src/src/io/job/job.c:4429
#5  0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9771138)
    at ../pvfs2_src/src/io/job/job.c:5934
#6  0x0808d623 in job_precreate_pool_get_handles (fsid=1664005450, count=2,
servers=0x0,
    handle_array=0x9751658, flags=0, user_ptr=0x9755d98, status_user_tag=0,
    out_status_p=0x972c348, id=0xbfeb71d0, context_id=0, hints=0x9759ed8)
    at ../pvfs2_src/src/io/job/job.c:5723
#7  0x080c2ae0 in get_handles (smcb=0x9755d98, js_p=0x972c348)
    at ../pvfs2_src/src/server/unstuff.sm:267
#8  0x0807630a in PINT_state_machine_invoke (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
#9  0x080766c8 in PINT_state_machine_next (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
#10 0x08076704 in PINT_state_machine_continue (smcb=0x9755d98, r=0x972c348)
    at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
#11 0x0805667c in main (argc=6, argv=0xbfeb7354) at
../pvfs2_src/src/server/pvfs2-server.c:413




On Thu, May 13, 2010 at 3:13 PM, Phil Carns <[email protected]> wrote:

>  Whoops.  Thanks for your patience Bart.  Can you try one more time with
> this additional patch applied?  If that fails I'll set up something here to
> try to reproduce it first hand.
>
> thanks,
> -Phil
>
>
> On 05/13/2010 04:32 PM, Bart Taylor wrote:
>
> Correction, I did get a core file this time. I just overlooked it.
> Backtrace below.
>
> Bart.
>
>
> (gdb) bt
> #0  0x008247a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1  0x00865825 in raise () from /lib/tls/libc.so.6
> #2  0x00867289 in abort () from /lib/tls/libc.so.6
> #3  0x00899d2a in __libc_message () from /lib/tls/libc.so.6
> #4  0x008a072f in _int_free () from /lib/tls/libc.so.6
> #5  0x008a0baa in free () from /lib/tls/libc.so.6
> #6  0x0808b73d in precreate_pool_get_thread_mgr_callback_unlocked
> (data=0x9f31188,
>     error_code=0) at ../pvfs2_src/src/io/job/job.c:4460
> #7  0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9f68ca0)
>     at ../pvfs2_src/src/io/job/job.c:5935
> #8  0x0808d623 in job_precreate_pool_get_handles (fsid=1141984428, count=2,
> servers=0x0,
>     handle_array=0x9f35a20, flags=0, user_ptr=0x9f512f0, status_user_tag=0,
>     out_status_p=0x9f0c348, id=0xbfebcc00, context_id=0, hints=0x9f4df58)
>     at ../pvfs2_src/src/io/job/job.c:5723
> #9  0x080c2ad8 in get_handles (smcb=0x9f512f0, js_p=0x9f0c348)
>     at ../pvfs2_src/src/server/unstuff.sm:267
> #10 0x0807630a in PINT_state_machine_invoke (smcb=0x9f512f0, r=0x9f0c348)
>     at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
> #11 0x080766c8 in PINT_state_machine_next (smcb=0x9f512f0, r=0x9f0c348)
>     at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
> #12 0x08076704 in PINT_state_machine_continue (smcb=0x9f512f0, r=0x9f0c348)
>     at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
> #13 0x0805667c in main (argc=6, argv=0xbfebcd84) at
> ../pvfs2_src/src/server/pvfs2-server.c:413
>
>
>
>
> On Thu, May 13, 2010 at 2:18 PM, Bart Taylor <[email protected]> wrote:
>
>> Hey Phil,
>>
>> Unfortunately, I didn't have any luck with the patch. I didn't get a core
>> file this time, but one of the daemons quit responding. I was able to run a
>> ping and statfs again, but as soon as I tried to write that file, the server
>> stalled. What other information can I get you?
>>
>> Bart.
>>
>>
>>
>>
>> On Thu, May 13, 2010 at 12:14 PM, Phil Carns <[email protected]> wrote:
>>
>>> Hey Bart,
>>>
>>> I haven't really tested this change yet, but can you try the attached
>>> patch and see if that seems to solve the problem?  I think this is follow on
>>> to the same bug you guys reported earlier.   I just missed another race
>>> issue caused by the last patch.
>>>
>>> -Phil
>>>
>>>
>>> On 05/12/2010 05:18 PM, Bart Taylor wrote:
>>>
>>>  Hey guys,
>>>
>>> I have a 3 node local disk file system that had a core dump during some
>>> testing. It is an upgraded fs from 2.6 to 2.8.2. After the upgrade, I ran a
>>> couple of utilities like pvfs2-ping and pvfs2-statfs. After those succeeded,
>>> I attempted to create a new file of around 800K, and the first server died.
>>> There wasn't anything useful in the logs or dmesg. Below is a backtrace from
>>> the core file. I can supply the entire file, but I can't email it at 43M.
>>>
>>> This may be related to the precreate-pool-race patch from a few days ago
>>> since the backtrace indicates it was in the vicinity of those code changes.
>>>
>>> Let me know what else I can supply that will help.
>>>
>>> Bart.
>>>
>>>
>>>
>>>
>>> (gdb) bt
>>> #0  0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>>> #1  0x00a24825 in raise () from /lib/tls/libc.so.6
>>> #2  0x00a26289 in abort () from /lib/tls/libc.so.6
>>> #3  0x00a58d2a in __libc_message () from /lib/tls/libc.so.6
>>> #4  0x00a5f72f in _int_free () from /lib/tls/libc.so.6
>>> #5  0x00a5fbaa in free () from /lib/tls/libc.so.6
>>> #6  0x0807d6e5 in precreate_pool_get_thread_mgr_callback_unlocked
>>> (data=0xb55d30f0, error_code=0) at ../pvfs2_src/src/io/job/job.c:4456
>>> #7  0x0807fd3d in precreate_pool_get_handles_try_post (jd=0xb55d4110) at
>>> ../pvfs2_src/src/io/job/job.c:5930
>>> #8  0x0807f5b9 in job_precreate_pool_get_handles (fsid=140299291,
>>> count=2, servers=0x0, handle_array=0xb55d41f0, flags=0, user_ptr=0xb5507c98,
>>>     status_user_tag=0, out_status_p=0x9c23348, id=0xbffc11b0,
>>> context_id=0, hints=0xb5506a88) at ../pvfs2_src/src/io/job/job.c:5718
>>> #9  0x0806c3cc in get_handles (smcb=0xb5507c98, js_p=0x9c23348) at
>>> ../pvfs2_src/src/server/unstuff.sm:267
>>> #10 0x08095e06 in PINT_state_machine_invoke (smcb=0xb5507c98,
>>> r=0x9c23348) at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
>>> #11 0x080961c4 in PINT_state_machine_next (smcb=0xb5507c98, r=0x9c23348)
>>> at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
>>> #12 0x08096200 in PINT_state_machine_continue (smcb=0xb5507c98,
>>> r=0x9c23348) at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
>>> #13 0x0805667c in main (argc=6, argv=0xbffc1334) at
>>> ../pvfs2_src/src/server/pvfs2-server.c:413
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to