Eric,

I discussed it with Phil and it looks like the dataspace has a handle that isn't part of the defined handle range in the config file. Here are a couple of possible fixes.

In trove_check_handle_ranges() you could have it just continue after printing the error. That should still give you a shot to recover your data. Another method would be to tweak some code to try to delete the handle from db itself.

kevin


On Feb 9, 2010, at 4:24 PM, Eric J. Walter wrote:


Hi Kevin,

Yes, it appears that the "repair" of the database allowed the server to start-up. Here is what happens when I start it with
"EventLogging" set to "all" in the fs.conf file:

[D 02/09 17:10] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 02/09 17:10] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1)
[D 02/09 17:10] op_queue add: 0x61fdb0
[D 02/09 17:10] could not remove handle 3074457345618248967
[D 02/09 17:10] op_queue add: 0x61fdb0

This repeats over and over again until I stop the server.

Only the "broken" server log says this.  The other logs just have:

[P 02/09 17:16] Start times (hr:min:sec): 17:16:49.553 17:16:48.533 17:16:47.513 17:16:46.493 17:16:45.473 17:16:44.452 [P 02/09 17:16] Intervals (hr:min:sec) : 00:00:01.020 00:00:01.020 00:00:01.020 00:00:01.020 00:00:01.020 00:00:01.021 [P 02/09 17:17] bytes read : 0 0 0 0 0 0 [P 02/09 17:17] bytes written : 0 0 0 0 0 0 [P 02/09 17:17] metadata reads : 0 0 0 0 0 0 [P 02/09 17:17] metadata writes : 0 0 0 0 0 0 [P 02/09 17:17] metadata dspace ops : 0 0 0 0 0 0 [P 02/09 17:17] metadata keyval ops : 2 2 2 2 2 2 [P 02/09 17:17] request scheduler : 0 0 0 0 0 0 [D 02/09 17:17] [SM Exiting]: (0x6a04c0) perf_update_sm:do_work (error code: 0), (action: DEFERRED) [D 02/09 17:17] [SM Entering]: (0x6a1830) job_timer_sm:do_work (status: 0)

Thanks again,

Eric



Kevin Harms wrote:
Eric,

so i take it the "repaired" database allowed the pvfs2-server to start? Based on this it looks like perhaps it suffered a fatal error soon after since pvfs2-fsck command could not connect to it. What does teh pvfs-2 server log say?

kevin

On Feb 9, 2010, at 2:28 PM, Eric J. Walter wrote:


Kevin,

Hi, I have done what you have said and repeated the db_dump and db_load.

The db_verify of dataspace_attributes.db produces no errors and the pvfs2-server starts with no errors. Unfortunately, the clients can't seem to communicate with the servers after mounting:

>>> /share/apps/pvfs-2.8.1/bin/pvfs2-fsck -v -m /mnt/pvfs2
[E 15:20:09.068943] job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 12. [E 15:20:09.069756] Warning: msgpair failed to ib://pvfs-2:3335, will retry: Connection timed out [E 15:20:09.069808] *** msgpairarray_completion_fn: msgpair to server [UNKNOWN] failed: Connection timed out [E 15:20:09.069829] *** Non-BMI failure . [E 15:20:09.069859] ERROR: could not initialize any file systems in / etc/pvfs2tab. PVFS_util_init_defaults: No such device (error class: 0) This same thing happens for any command (e.g. pvfs2-ls pvfs- statfs etc.)

Perhaps there is something I am missing?

Eric


Kevin Harms wrote:
Eric,

I'm not sure what is wrong with your .db exactly but to use db_load, it needs to be modified to add the keys back in the correct "sorted" order. Where "sorted" means in the order PVFS expects. You need to modify db_load.c to something like this:

if ((ret = dbp->set_bt_compare(dbp, PINT_trove_dbpf_ds_attr_compare)) != 0) {
      dbp->err(dbp, ret, "DB->set_bt_compare");
      goto err;
}

Then paste the PINT_trove_dbpf_ds_attr_compare function and associated data structure definitions into the db_load.c source as well. You should get the db_load.c from your particular version of bdb you're using.

kevin

On Feb 8, 2010, at 7:16 PM, Eric J. Walter wrote:



Hi,

I have a problem starting up an I/O node. It is one of 3 servers that
we run v2.8.1 on
over Inifiniband. It is not used for metadata. After a finding a file
which
had '?--?--?' like permissions, I decided to restart the pvfs servers
and remount all
of the clients.  Now, one of the three I/O nodes can't start it's
pvfs2-server.
The other two start correctly.

Here is the server log from the problem server:

[D 02/08 19:40] PVFS2 Server version 2.8.1 starting.
[E 02/08 19:40] dbpf_dspace_iterate_handles_op_svc: Invalid argument
[E 02/08 19:40] Error adding handle range
1537228672809129303 -3074457345618258602,6148914691236517203-7686143364045646502
to filesystem pvfs2-fs
[E 02/08 19:40] Error: Could not initialize server interfaces; aborting.
[E 02/08 19:40] Error: Could not initialize server; aborting.

I am also using db4-4.2.52-7.1 of the DB software. Reading through the
previous
mailing lists discussions, I found that running db_recover on the .db files (after backing them up) could be helpful. The only .db file which
has any problems with verify is
dataspace_attributes.db on the problem I/O node. Here is what it reports:

# db_verify -o dataspace_attributes.db
db_verify: Page 865: item 57 of unrecognizable type
db_verify: Page 865: gap between items at offset 1376
db_verify: Page 865: item order check unsafe: skipping
db_verify: DB->verify: dataspace_attributes.db: DB_VERIFY_BAD: Database
verification failed

So I tried db_recover -v in the same directory and in the directory
above (I am not sure where to run it) and all I get is:

db_recover: Finding last valid log LSN: file: 1 offset 28

and a small binary file named "log.0000000001".

This step seems to do nothing, i.e. the db_verify report doesn't change
after this.

I have also tried db_dump -r followed by db_load and this also does not
change the
db_verify output.

Is there anything else I can do except wipe the filesystem and rebuild?

Thanks for any help I can get.

Eric J. Walter
Department of Physics
College of William and Mary




_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users





Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to