So, I figured out two ways to fix the missing SST file problem. Described here for future generations.

Solution 1:

Shut down Riak on the node with the missing file.
Delete (or move sideways) the LevelDB partition with the missing file.
Start Riak.
Repair the KV Indexes[1] which forces a partition handoff from the replicas (I don't know if this step is needed or if Riak will notice the empty partition and fix itself automatically).

Solution 2:

Shut down Riak on the node with the missing file.
Follow the instructions[2] to initiate a LevelDB repair. (This seems to rebuild the MANIFEST file based on the SST files.)
Start Riak.
Because the data that was in the non-existent SST file is still missing you'll need to: Repair the KV Indexes[1] which forces a partition handoff from the replicas (I don't know if this step is needed or if Riak will fix itself automatically).

[1] http://docs.basho.com/riak/1.2.1/cookbooks/Repairing-KV-Indexes/
[2] https://gist.github.com/2834473

Hope this helps!

Shane.

On 11/01/13 13:47, Shane McEwan wrote:
Thanks Matthew.

We're running version 1.2.1.

I was actually following the Repair KV Indexes[1] instructions which
triggered the problem. I was doing the repair mostly out of curiosity to
see what it did. I was thinking of using it as a sanity check for Riak
backups.

I assume there's a different sort of repair I can run?

[1] http://docs.basho.com/riak/1.2.1/cookbooks/Repairing-KV-Indexes/

On 11/01/13 12:48, Matthew Von-Maszewski wrote:
What version of Riak?

Likely you need to take the node offline and run repair.

Matthew


On Jan 11, 2013, at 4:50 AM, Shane McEwan <[email protected]> wrote:

G'day!

I posted this to the LevelDB mailing list with little success.
Apologies if you've already seen this from there.

We've started getting errors in a LevelDB LOG file about a missing
SST file:

2013/01/10-15:08:12.714525 7fb0767fa700 Compacting 14@0 + 7@1 files
2013/01/10-15:08:25.121147 7fb0767fa700 compacted to: files[ 14 7 50
105 0 0 0 ]
2013/01/10-15:08:25.121488 7fb0767fa700 Delete type=2 #111404
2013/01/10-15:08:25.147976 7fb0767fa700 Compaction error: IO error:
/data/riak/leveldb/902020541790166644828836732692080926193895866368/006558.sst:
No such file or directory

I assume it means we've got an SST file listed in the MANIFEST file
that doesn't exist anymore. The SST in question doesn't exist in any
of the snapshots I have around the time it was likely to have been
created.

I saw mention of a bug fixed in the latest LevelDB[1] that could
cause what we're seeing except that we haven't run out of disk space
so I'm not sure we've hit that. I'm less interested in HOW it
happened since we've moved to different hardware since then.

We haven't noticed any missing data in our database (perhaps Riak
replicas are helping there?) and even if there is something missing
the nature of our data means that we can probably live without it.

My question is, can we remove the offending file's entry out of the
MANIFEST file somehow? Or will it sort itself out? Currently our idle
test database is spinning at 100% CPU trying to compact a file that
doesn't exist.

[1]
https://groups.google.com/forum/#!msg/leveldb/Kc9JxuIUu5A/9P0N9RL4ar8J

Any advice would be greatly appreciated. Thanks!

Shane.


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to