Everyone,
I recently had a couple compactions, minors that were promoted to
majors, take 8 and 10 minutes each. I eventually killed the
regionserver underneath them as I'd never seen compactions last that
long before. In looking through the logs from the regionserver that was
killed and watching one of the regions after it was moved over, I saw
that it took about 3 minutes to compact on the second regionserver. I
also noticed that the temporary location for the newly compacted
storfile matched in both the first (failed/killed) and second
(succeeded) regionserver log.
My question is this. If a compaction fails due to a regionserver loss
mid-compaction, does the regionserver that picks up the region continue
where the first left off? Or does it have to start from scratch?
Basically, I'm wondering if waiting an additional 3 minutes or so would
have finally worked through the region on the first server, or if it was
truly stuck for some other, unknown reason.
Thanks!
--Brennon
- Compaction timing and recovery from failure Brennon Church
-