Whoops. Thanks for your patience Bart. Can you try one more time with
this additional patch applied? If that fails I'll set up something here
to try to reproduce it first hand.
thanks,
-Phil
On 05/13/2010 04:32 PM, Bart Taylor wrote:
Correction, I did get a core file this time. I just overlooked it.
Backtrace below.
Bart.
(gdb) bt
#0 0x008247a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00865825 in raise () from /lib/tls/libc.so.6
#2 0x00867289 in abort () from /lib/tls/libc.so.6
#3 0x00899d2a in __libc_message () from /lib/tls/libc.so.6
#4 0x008a072f in _int_free () from /lib/tls/libc.so.6
#5 0x008a0baa in free () from /lib/tls/libc.so.6
#6 0x0808b73d in precreate_pool_get_thread_mgr_callback_unlocked
(data=0x9f31188,
error_code=0) at ../pvfs2_src/src/io/job/job.c:4460
#7 0x0808dda7 in precreate_pool_get_handles_try_post (jd=0x9f68ca0)
at ../pvfs2_src/src/io/job/job.c:5935
#8 0x0808d623 in job_precreate_pool_get_handles (fsid=1141984428,
count=2, servers=0x0,
handle_array=0x9f35a20, flags=0, user_ptr=0x9f512f0,
status_user_tag=0,
out_status_p=0x9f0c348, id=0xbfebcc00, context_id=0, hints=0x9f4df58)
at ../pvfs2_src/src/io/job/job.c:5723
#9 0x080c2ad8 in get_handles (smcb=0x9f512f0, js_p=0x9f0c348)
at ../pvfs2_src/src/server/unstuff.sm:267 <http://unstuff.sm:267>
#10 0x0807630a in PINT_state_machine_invoke (smcb=0x9f512f0, r=0x9f0c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:132
#11 0x080766c8 in PINT_state_machine_next (smcb=0x9f512f0, r=0x9f0c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:309
#12 0x08076704 in PINT_state_machine_continue (smcb=0x9f512f0,
r=0x9f0c348)
at ../pvfs2_src/src/common/misc/state-machine-fns.c:327
#13 0x0805667c in main (argc=6, argv=0xbfebcd84) at
../pvfs2_src/src/server/pvfs2-server.c:413
On Thu, May 13, 2010 at 2:18 PM, Bart Taylor <[email protected]
<mailto:[email protected]>> wrote:
Hey Phil,
Unfortunately, I didn't have any luck with the patch. I didn't get
a core file this time, but one of the daemons quit responding. I
was able to run a ping and statfs again, but as soon as I tried to
write that file, the server stalled. What other information can I
get you?
Bart.
On Thu, May 13, 2010 at 12:14 PM, Phil Carns <[email protected]
<mailto:[email protected]>> wrote:
Hey Bart,
I haven't really tested this change yet, but can you try the
attached patch and see if that seems to solve the problem? I
think this is follow on to the same bug you guys reported
earlier. I just missed another race issue caused by the last
patch.
-Phil
On 05/12/2010 05:18 PM, Bart Taylor wrote:
Hey guys,
I have a 3 node local disk file system that had a core dump
during some testing. It is an upgraded fs from 2.6 to 2.8.2.
After the upgrade, I ran a couple of utilities like
pvfs2-ping and pvfs2-statfs. After those succeeded, I
attempted to create a new file of around 800K, and the first
server died. There wasn't anything useful in the logs or
dmesg. Below is a backtrace from the core file. I can supply
the entire file, but I can't email it at 43M.
This may be related to the precreate-pool-race patch from a
few days ago since the backtrace indicates it was in the
vicinity of those code changes.
Let me know what else I can supply that will help.
Bart.
(gdb) bt
#0 0x009e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00a24825 in raise () from /lib/tls/libc.so.6
#2 0x00a26289 in abort () from /lib/tls/libc.so.6
#3 0x00a58d2a in __libc_message () from /lib/tls/libc.so.6
#4 0x00a5f72f in _int_free () from /lib/tls/libc.so.6
#5 0x00a5fbaa in free () from /lib/tls/libc.so.6
#6 0x0807d6e5 in
precreate_pool_get_thread_mgr_callback_unlocked
(data=0xb55d30f0, error_code=0) at
../pvfs2_src/src/io/job/job.c:4456
#7 0x0807fd3d in precreate_pool_get_handles_try_post
(jd=0xb55d4110) at ../pvfs2_src/src/io/job/job.c:5930
#8 0x0807f5b9 in job_precreate_pool_get_handles
(fsid=140299291, count=2, servers=0x0,
handle_array=0xb55d41f0, flags=0, user_ptr=0xb5507c98,
status_user_tag=0, out_status_p=0x9c23348, id=0xbffc11b0,
context_id=0, hints=0xb5506a88) at
../pvfs2_src/src/io/job/job.c:5718
#9 0x0806c3cc in get_handles (smcb=0xb5507c98,
js_p=0x9c23348) at ../pvfs2_src/src/server/unstuff.sm:267
<http://unstuff.sm:267>
#10 0x08095e06 in PINT_state_machine_invoke (smcb=0xb5507c98,
r=0x9c23348) at
../pvfs2_src/src/common/misc/state-machine-fns.c:132
#11 0x080961c4 in PINT_state_machine_next (smcb=0xb5507c98,
r=0x9c23348) at
../pvfs2_src/src/common/misc/state-machine-fns.c:309
#12 0x08096200 in PINT_state_machine_continue
(smcb=0xb5507c98, r=0x9c23348) at
../pvfs2_src/src/common/misc/state-machine-fns.c:327
#13 0x0805667c in main (argc=6, argv=0xbffc1334) at
../pvfs2_src/src/server/pvfs2-server.c:413
_______________________________________________
Pvfs2-developers mailing list
[email protected]
<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
? log.txt
? pvfs2
Index: src/io/job/job.c
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/io/job/job.c,v
retrieving revision 1.192
diff -a -u -p -r1.192 job.c
--- src/io/job/job.c 13 May 2010 18:10:06 -0000 1.192
+++ src/io/job/job.c 13 May 2010 21:11:10 -0000
@@ -5865,6 +5865,9 @@ static void precreate_pool_get_handles_t
= &tmp_trove_array[i];
}
+ /* pre-increment pending count before posting any trove operations */
+ jd->u.precreate_pool.trove_pending = jd->u.precreate_pool.precreate_handle_count;
+
/* post all trove operations at once */
for(i=0; i<jd->u.precreate_pool.precreate_handle_count; i++)
{
@@ -5905,9 +5908,6 @@ static void precreate_pool_get_handles_t
}
}
- /* pre-increment pending count before posting trove operation */
- trove_pending_count++;
- jd->u.precreate_pool.trove_pending++;
/* post trove operation to pull out a handle */
ret = trove_keyval_iterate_keys(
@@ -5937,6 +5937,7 @@ static void precreate_pool_get_handles_t
}
else
{
+ trove_pending_count++;
/* callback will be triggered later */
}
}
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers