I got the same result with this patch. The backtrace is still basically the
same, but since it is just a little different, I included it.
Bart.
(gdb) bt
#0 0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00a24825 in raise () from /lib/tls/libc.so.6
#2 0x00a26289 in abort () from /lib/tls/libc.so.6
#3 0x00a1dda1 in __assert_fail () from /lib/tls/libc.so.6
#4 0x0808b680 in precreate_pool_get_thread_mgr_callback_unlocked
(data=0x97719a8,
error_code=0) at ../pvfs2_src/src/io/job/job.c:4429
#5 0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9771138)
at ../pvfs2_src/src/io/job/job.c:5934
#6 0x0808d623 in job_precreate_pool_get_handles (fsid=1664005450, count=2,
servers=0x0,
handle_array=0x9751658, flags=0, user_ptr=0x9755d98, status_user_tag=0,
out_status_p=0x972c348, id=0xbfeb71d0, context_id=0, hints=0x9759ed8)
at ../pvfs2_src/src/io/job/job.c:5723
#7 0x080c2ae0 in get_handles (smcb=0x9755d98, js_p=0x972c348)
at ../pvfs2_src/src/server/unstuff.sm:267
#8 0x0807630a in PINT_state_machine_invoke (smcb=0x9755d98, r=0x972c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
#9 0x080766c8 in PINT_state_machine_next (smcb=0x9755d98, r=0x972c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
#10 0x08076704 in PINT_state_machine_continue (smcb=0x9755d98, r=0x972c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
#11 0x0805667c in main (argc=6, argv=0xbfeb7354) at
../pvfs2_src/src/server/pvfs2-server.c:413
On Thu, May 13, 2010 at 3:13 PM, Phil Carns <[email protected]> wrote:
> Whoops. Thanks for your patience Bart. Can you try one more time with
> this additional patch applied? If that fails I'll set up something here to
> try to reproduce it first hand.
>
> thanks,
> -Phil
>
>
> On 05/13/2010 04:32 PM, Bart Taylor wrote:
>
> Correction, I did get a core file this time. I just overlooked it.
> Backtrace below.
>
> Bart.
>
>
> (gdb) bt
> #0 0x008247a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1 0x00865825 in raise () from /lib/tls/libc.so.6
> #2 0x00867289 in abort () from /lib/tls/libc.so.6
> #3 0x00899d2a in __libc_message () from /lib/tls/libc.so.6
> #4 0x008a072f in _int_free () from /lib/tls/libc.so.6
> #5 0x008a0baa in free () from /lib/tls/libc.so.6
> #6 0x0808b73d in precreate_pool_get_thread_mgr_callback_unlocked
> (data=0x9f31188,
> error_code=0) at ../pvfs2_src/src/io/job/job.c:4460
> #7 0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9f68ca0)
> at ../pvfs2_src/src/io/job/job.c:5935
> #8 0x0808d623 in job_precreate_pool_get_handles (fsid=1141984428, count=2,
> servers=0x0,
> handle_array=0x9f35a20, flags=0, user_ptr=0x9f512f0, status_user_tag=0,
> out_status_p=0x9f0c348, id=0xbfebcc00, context_id=0, hints=0x9f4df58)
> at ../pvfs2_src/src/io/job/job.c:5723
> #9 0x080c2ad8 in get_handles (smcb=0x9f512f0, js_p=0x9f0c348)
> at ../pvfs2_src/src/server/unstuff.sm:267
> #10 0x0807630a in PINT_state_machine_invoke (smcb=0x9f512f0, r=0x9f0c348)
> at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
> #11 0x080766c8 in PINT_state_machine_next (smcb=0x9f512f0, r=0x9f0c348)
> at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
> #12 0x08076704 in PINT_state_machine_continue (smcb=0x9f512f0, r=0x9f0c348)
> at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
> #13 0x0805667c in main (argc=6, argv=0xbfebcd84) at
> ../pvfs2_src/src/server/pvfs2-server.c:413
>
>
>
>
> On Thu, May 13, 2010 at 2:18 PM, Bart Taylor <[email protected]> wrote:
>
>> Hey Phil,
>>
>> Unfortunately, I didn't have any luck with the patch. I didn't get a core
>> file this time, but one of the daemons quit responding. I was able to run a
>> ping and statfs again, but as soon as I tried to write that file, the server
>> stalled. What other information can I get you?
>>
>> Bart.
>>
>>
>>
>>
>> On Thu, May 13, 2010 at 12:14 PM, Phil Carns <[email protected]> wrote:
>>
>>> Hey Bart,
>>>
>>> I haven't really tested this change yet, but can you try the attached
>>> patch and see if that seems to solve the problem? I think this is follow on
>>> to the same bug you guys reported earlier. I just missed another race
>>> issue caused by the last patch.
>>>
>>> -Phil
>>>
>>>
>>> On 05/12/2010 05:18 PM, Bart Taylor wrote:
>>>
>>> Hey guys,
>>>
>>> I have a 3 node local disk file system that had a core dump during some
>>> testing. It is an upgraded fs from 2.6 to 2.8.2. After the upgrade, I ran a
>>> couple of utilities like pvfs2-ping and pvfs2-statfs. After those succeeded,
>>> I attempted to create a new file of around 800K, and the first server died.
>>> There wasn't anything useful in the logs or dmesg. Below is a backtrace from
>>> the core file. I can supply the entire file, but I can't email it at 43M.
>>>
>>> This may be related to the precreate-pool-race patch from a few days ago
>>> since the backtrace indicates it was in the vicinity of those code changes.
>>>
>>> Let me know what else I can supply that will help.
>>>
>>> Bart.
>>>
>>>
>>>
>>>
>>> (gdb) bt
>>> #0 0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>>> #1 0x00a24825 in raise () from /lib/tls/libc.so.6
>>> #2 0x00a26289 in abort () from /lib/tls/libc.so.6
>>> #3 0x00a58d2a in __libc_message () from /lib/tls/libc.so.6
>>> #4 0x00a5f72f in _int_free () from /lib/tls/libc.so.6
>>> #5 0x00a5fbaa in free () from /lib/tls/libc.so.6
>>> #6 0x0807d6e5 in precreate_pool_get_thread_mgr_callback_unlocked
>>> (data=0xb55d30f0, error_code=0) at ../pvfs2_src/src/io/job/job.c:4456
>>> #7 0x0807fd3d in precreate_pool_get_handles_try_post (jd=0xb55d4110) at
>>> ../pvfs2_src/src/io/job/job.c:5930
>>> #8 0x0807f5b9 in job_precreate_pool_get_handles (fsid=140299291,
>>> count=2, servers=0x0, handle_array=0xb55d41f0, flags=0, user_ptr=0xb5507c98,
>>> status_user_tag=0, out_status_p=0x9c23348, id=0xbffc11b0,
>>> context_id=0, hints=0xb5506a88) at ../pvfs2_src/src/io/job/job.c:5718
>>> #9 0x0806c3cc in get_handles (smcb=0xb5507c98, js_p=0x9c23348) at
>>> ../pvfs2_src/src/server/unstuff.sm:267
>>> #10 0x08095e06 in PINT_state_machine_invoke (smcb=0xb5507c98,
>>> r=0x9c23348) at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
>>> #11 0x080961c4 in PINT_state_machine_next (smcb=0xb5507c98, r=0x9c23348)
>>> at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
>>> #12 0x08096200 in PINT_state_machine_continue (smcb=0xb5507c98,
>>> r=0x9c23348) at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
>>> #13 0x0805667c in main (argc=6, argv=0xbffc1334) at
>>> ../pvfs2_src/src/server/pvfs2-server.c:413
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers