[E 05/11 21:19] Warning: got duplicate handle 12345678.
I'm not sure what (if anything) happens later on if you keep running a server that has duplicate handle entries, but you can always clean one of them out if you want to using the pvfs2-remove-object utility.
FWIW, Berkeley DB _does_ support duplicate keys, but its only supposed to be possible if you set explicit flags to enable that functionality (which PVFS does not do). I'm not sure what led to this particular corruption, but these servers had experienced DB panics in the past while using Berkeley DB 4.3. They have now been upgraded to use 4.8. Prior to this patch the servers with duplicate entries were generating an incorrect handle ledger on startup (skipping several valid handles), which caused pvfs2-fsck to fail when it tried to double check the handle count. I imagine there were probably other side effects as well.
Can someone sanity check this patch and commit it to the appropriate cvs branches if it looks Ok?
thanks! -Phil
diff -Naupr pvfs-2.8.2/src/io/trove/trove-dbpf/dbpf-dspace.c pvfs-2.8.2-new/src/io/trove/trove-dbpf/dbpf-dspace.c
--- pvfs-2.8.2/src/io/trove/trove-dbpf/dbpf-dspace.c 2010-02-04 12:40:12.000000000 -0600
+++ pvfs-2.8.2-new/src/io/trove/trove-dbpf/dbpf-dspace.c 2011-05-11 21:40:54.000000000 -0500
@@ -892,6 +892,15 @@ static int dbpf_dspace_iterate_handles_o
continue;
}
+ /* check for duplicates */
+ if(i > 0 && *(TROVE_handle*)tmp_handle == op_p->u.d_iterate_handles.handle_array[i-1])
+ {
+ gossip_err("Warning: got duplicate handle %llu.\n", llu(*(TROVE_handle*)tmp_handle));
+ gossip_err("Warning: skipping entry.\n");
+ i--;
+ continue;
+ }
+
op_p->u.d_iterate_handles.handle_array[i] =
*(TROVE_handle *)tmp_handle;
}
@@ -923,8 +932,16 @@ static int dbpf_dspace_iterate_handles_o
{
goto get_next;
}
+
+ if(*(TROVE_handle*)tmp_handle == op_p->u.d_iterate_handles.handle_array[*op_p->u.d_iterate_handles.count_p])
+ {
+ gossip_err("Warning: found duplicate handle: %llu\n", llu(*(TROVE_handle*)tmp_handle));
+ gossip_err("Warning: skipping entry.\n");
+ }
+
} while (sizeof_handle != sizeof(TROVE_handle) ||
- sizeof_attr != sizeof(attr));
+ sizeof_attr != sizeof(attr) ||
+ *(TROVE_handle*)tmp_handle == op_p->u.d_iterate_handles.handle_array[*op_p->u.d_iterate_handles.count_p]);
*op_p->u.d_iterate_handles.position_p = *(TROVE_handle *)tmp_handle;
goto return_ok;
@@ -956,6 +973,13 @@ get_next:
"failure @ recno\n");
ret = -dbpf_db_error_to_trove_error(ret);
}
+ if(*op_p->u.d_iterate_handles.count_p > 0 &&
+ dummy_handle == op_p->u.d_iterate_handles.handle_array[*op_p->u.d_iterate_handles.count_p])
+ {
+ gossip_err("Warning: found duplicate handle: %llu\n", llu(dummy_handle));
+ gossip_err("Warning: skipping entry.\n");
+ (*op_p->u.d_iterate_handles.count_p)--;
+ }
*op_p->u.d_iterate_handles.position_p = dummy_handle;
return_ok:
diff -Naupr pvfs-2.8.2/src/io/trove/trove-handle-mgmt/trove-handle-mgmt.c pvfs-2.8.2-new/src/io/trove/trove-handle-mgmt/trove-handle-mgmt.c
--- pvfs-2.8.2/src/io/trove/trove-handle-mgmt/trove-handle-mgmt.c 2010-02-04 12:40:12.000000000 -0600
+++ pvfs-2.8.2-new/src/io/trove/trove-handle-mgmt/trove-handle-mgmt.c 2011-05-11 21:41:12.000000000 -0500
@@ -126,10 +126,9 @@ static int trove_check_handle_ranges(TRO
ret = trove_handle_remove(ledger, handles[i]);
if (ret != 0)
{
- gossip_debug(
- GOSSIP_TROVE_DEBUG, "could not remove "
- "handle %llu\n", llu(handles[i]));
- break;
+ gossip_err(
+ "WARNING: could not remove "
+ "handle %llu from ledger; continuing.\n", llu(handles[i]));
}
}
ret = ((i == count) ? 0 : -1);
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
