On Tue, Aug 29, 2006 at 04:55:06PM -0400, Walter B. Ligon III wrote:
> So, I would appreciate some help running some tests on the branch, while 
> I start documenting, and let me know when you think I should start 
> merging it back with the trunk.  Or I'm open to whatever other 
> suggestions ...

OK, walt, we're getting close.  I committed a couple small fixes to
get pvfs2-client-core building.  Here's what's not working so well
right now:

- mounting pvfs2 fails with a timeout

- many MPI-IO workloads pass, but the noncontig test triggered a
  segfault in small_io_cleanup, where it cleans up various fields in
  the sm_p structure.  In particular, 'sm_p->msgarray = NULL' caused a
  core dump, and when I look at that core file in gdb,
  sm_p->msgarray_count is really high (135950228).  Looks like maybe
  the sm_p wasn't properly allocated? I dunno, I'm just the messenger.

- pvfs2-cp dies with a segfault when using a very small blocksize (-b
  128). here's where gdb says the fault lies:

---------------
  #0  0x0806d3d8 in small_io_completion_fn (user_args=0x80f0da8, 
    resp_p=0xbfffb42c, index=0) at sys-small-io.sm:242
242                 fdata.server_nr = sm_p->u.io.datafile_index_array[index];
(gdb) p sm_p->u.io                            
$8 = {io_type = 135162104, file_req = 0x2, file_req_offset = 0, buffer = 0x0, 
  mem_req = 0x0, io_resp_p = 0x50, flowproto_type = 17, encoding = 135206232, 
  datafile_index_array = 0x0, datafile_count = 0, 
  msgpair_completion_count = 81, flow_completion_count = 0, 
  write_ack_completion_count = 0, contexts = 0x80f13d4, 
  context_count = 135205832, total_cancellations_remaining = 0, 
  retry_count = 135206064, stored_error_code = 3396, total_size = 9, 
  dfile_size_array = 0x0, small_io = 0}
---------------

- test-zero-fill fails with a segfault in the same place as pvfs2-cp:

---------------
#0  0x08065149 in small_io_completion_fn (user_args=0x80e9940, 
    resp_p=0xbfffb86c, index=0) at sys-small-io.sm:317
317         sm_p->u.io.dfile_size_array[index] = 
resp_p->u.small_io.bstream_size;
---------------

- pvfs2-mkdir (a test contributed by acxiom) fails with a seg fault:

---------------
#0  0x080b134e in PINT_smcb_op (smcb=0x0)
    at 
/sandbox/robl/pvfs2-nightly/pvfs2-WALT3/src/common/misc/state-machine-fns.c:348
348         return smcb->op;
---------------


So I think if you can take care of the small-io cases, that would be a
good start, as it would knock out 3 of the 5 failures.  Once WALT3
passes our nightlies, we can think about merging into HEAD.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to