Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

lohit Thu, 04 Aug 2011 20:37:15 -0700

2011/8/4 Ryan Rawson <[email protected]>

> Yes, that is what JD is referring to, the so-called IO fence.
>
> It works like so:
> - regionserver is appending to an HLog, continues to do so, hasnt
> gotten the ZK "kill yourself signal" yet
> - hmaster splits the logs
> - the hmaster yanks the writer from under the regionserver, and the RS
> then starts to kill itself
>
Can you tell more about how this is done with HDFS. If RS has the lease, how
did master get hold of that lease. Or is it removing file?


>
>
> This can happen because ZK can deliver the session lost message late,
> and there is a race.
>
> -ryan
>
> On Thu, Aug 4, 2011 at 8:13 PM, M. C. Srivas <[email protected]> wrote:
> > On Thu, Aug 4, 2011 at 10:34 AM, Jean-Daniel Cryans <[email protected]
> >wrote:
> >
> >> > Thanks for the feedback.  So you're inclined to think it would be at
> the
> >> dfs
> >> > layer?
> >>
> >> That's where the evidence seems to point.
> >>
> >> >
> >> > Is it accurate to say the most likely places where the data could have
> >> been
> >> > lost were:
> >> > 1. wal writes didn't actually get written to disk (no log entries to
> >> suggest
> >> > any issues)
> >>
> >> Most likely.
> >>
> >> > 2. wal corrupted (no log entries suggest any trouble reading the log)
> >>
> >> In that case the logs would scream (and I didn't see that in the logs
> >> I looked at).
> >>
> >> > 3. not all split logs were read by regionservers  (?? is there any way
> to
> >> > ensure this either way... should I look at the filesystem some place?)
> >>
> >> Some regions would have recovered edits files, but that seems highly
> >> unlikely. With DEBUG enabled we could have seen which files were split
> >> by the master and which ones were created for the regions, and then
> >> which were read by the region servers.
> >>
> >> >
> >> > Do you think the type of network partition I'm talking about is
> >> adequately
> >> > covered in existing tests? (Specifically running an external zk
> cluster?)
> >>
> >> The IO fencing was only tested with HDFS, I don't know what happens in
> >> that case with MapR. What I mean is that when the master splits the
> >> logs, it takes ownership of the HDFS writer lease (only one per file)
> >> so that it can safely close the log file. Then after that it checks if
> >> there are any new log files that were created (the region server could
> >> have rolled a log while the master was splitting them) and will
> >> restart if that situation happens until it's able to own all files and
> >> split them.
> >>
> >
> > JD,   I didn't think the master explicitly dealt with writer leases.
> >
> > Does HBase rely on single-writer semantics on the log file? That is, if
> the
> > master and a RS both decide to mucky-muck with a log file, you expect the
> FS
> > to lock out one of the writers?
> >
> >
> >
> >
> >>
> >> >
> >> > Have you heard if anyone else is been having problems with the second
> >> 90.4
> >> > rc?
> >>
> >> Nope, we run it here on our dev cluster and didn't encounter any issue
> >> (with the code or node failure).
> >>
> >> >
> >> > Thanks again for your help.  I'm following up with the MapR guys as
> well.
> >>
> >> Good idea!
> >>
> >> J-D
> >>
> >
>



-- 
Have a Nice Day!
Lohit

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

Reply via email to