Hey JD,
when the RS dies, the regions that it was serving are spread out amongst
the rest of the RS's, correct? But isn't the WAL a per-RS thingy rather
than a per-region thingy? How do the other RS's then recover the regions
alloted to them? Do they skip over log-records in the dead RS's WAL that do
not belong to the regions not allocated to them?
Also, how is the dead RS's WAL garbage-collected?
thanks,
Srivas.
On Fri, Jan 21, 2011 at 9:32 AM, Jean-Daniel Cryans <[email protected]>wrote:
> If the region servers gets YouAreDeadException, it does an "abort" and
> won't flush the data since another region server could already be
> serving the region. If you're not writing to the WAL, then yes it's
> data loss.
>
> Not sure what you mean by "shuts down cleanly" in your case, if you
> see a log that starts with "Aborting region server" then it's not
> really "clean".
>
> J-D
>
> On Fri, Jan 21, 2011 at 2:38 AM, Friso van Vollenhoven
> <[email protected]> wrote:
> > Hi all,
> >
> > Question: when a regionserver shuts down cleanly after a
> YouAreDeadException and the master nicely reassigns the regions, will you
> loose any data that was written to the memstore of the dead RS when not
> using WAL?
> >
> > There was no hard crash and not a single error in any of the logs (except
> for the FATAL: YouAreDeadException). The RS lost its zookeeper session after
> a timeout, probably GC combined with some other starvation on heavy load. I
> think the memstore flushes on shutdown, but I am not entirely sure what
> happens in the situation where regions are already opened by other
> regionservers when the dying executes the shutdown code. Can I assume that
> the RS that gets reassigned a region creates a new HFile and that this will
> be compacted together with the one left by the dead RS at the next
> compaction run?
> >
> >
> > Thanks,
> > Friso
> >
> >
>