Thanks for reporting it. Good find. On Wed, May 24, 2017 at 11:16 AM, Jerry He <[email protected]> wrote:
> In theory, the snapshot is a best effort Point in Time snapshot. No > guarantee of any clean cutoff. The data you missed in this snapshot will > appear in the next one. > But the current problem you saw can and should be fixed though. > > Jerry > > > On Wed, May 24, 2017 at 9:04 AM, LeBlanc, Jacob <[email protected]> > wrote: > >> Will do. I'll build off 1.1.4 with the patch, apply it to the region >> servers, and capture logs and let you know if I see the exception occur. >> >> --Jacob >> >> -----Original Message----- >> From: Ted Yu [mailto:[email protected]] >> Sent: Wednesday, May 24, 2017 11:57 AM >> To: [email protected] >> Subject: Re: Missing data in snapshot - possible flush timing issue? >> >> I attached tentative fix to HBASE-18099. >> >> If you have a bandwidth, you can try it out. >> >> On Wed, May 24, 2017 at 8:53 AM, LeBlanc, Jacob <[email protected]> >> wrote: >> >> > Great! I see the JIRA bug you just opened. I'll enable debug logging >> > on FlushSnapshotSubprocedure and HRegion on the region servers in the >> > cluster to see if I can capture log messages as better evidence. Since >> > it's a timing issue I'm not sure when we might see it again, but I'll >> > keep an eye out. >> > >> > Thanks so much for your help, >> > >> > --Jacob >> > >> > -----Original Message----- >> > From: Ted Yu [mailto:[email protected]] >> > Sent: Wednesday, May 24, 2017 11:29 AM >> > To: [email protected] >> > Subject: Re: Missing data in snapshot - possible flush timing issue? >> > >> > In FlushSnapshotSubprocedure (running on region server), there is >> > debug >> > log: >> > >> > LOG.debug("... Flush Snapshotting region " + >> > region.toString() + " >> > completed."); >> > >> > If you enable debug log, we would know whether the underlying region >> > is considered having completed the flush. >> > >> > Higher in call() method there is this: >> > >> > region.flush(true); >> > >> > The return value is not checked. >> > >> > In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to: >> > >> > String msg = "Not flushing since " >> > >> > + (writestate.flushing ? "already flushing" >> > >> > : "writes not enabled"); >> > >> > The above seems to correlate with your description. >> > >> > Let me log a JIRA referring to this thread. >> > >> > On Wed, May 24, 2017 at 8:08 AM, LeBlanc, Jacob >> > <[email protected]> >> > wrote: >> > >> > > Thanks for looking Ted! >> > > >> > > My understanding of the log messages is that the last line of the >> > > pastebin is the end of the flush of the memstore for the region >> > > where we missed data, but that line is tagged with >> "[MemStoreFlusher.1]" >> > > whereas the other regions that were getting flushed as part of >> > > snapshot are tagged with "[rs( >> > > a1-qa-hbr31416d.lab.lynx-connected.com >> > ,16020,1494432106955)-snapshot-pool81-thread-1]". >> > > With only a superficial understanding, it seems like the flush of >> > > that region where messages were tagged with "[MemStoreFlusher.1]", >> > > while happening at the same time, wasn't really part of the snapshot >> > > process. For example, line 3 in the pastebin shows the flush of one >> > > region starting and tagged with snapshot-pool81-thread-1, line 4 >> > > shows the flush starting for the region we missed data and tagged >> > > with MemStoreFlusher.1, and line 5 continues with the flush of >> > > region as part of snapshot. So it definitely looks like multiple >> > > flushes were occurring at the same time whereas elsewhere in the >> > > logs it seems like the flushes are always done sequentially as part >> > > of snapshot. So I came to the theory that perhaps there is a timing >> > > issue where the flushed data for a region is missed as part of a >> > > snapshot because the flush is occurring on another thread as part of >> > > normal, periodic >> > flushing of memstore. >> > > >> > > The last line I see in the full region server log that has anything >> > > to do with the snapshot is line 11 in the pastebin at 2017-05-12 >> > > 02:06:05,577 where it's processing events from zookeeper. Again with >> > > only a superficial understanding, I was assuming this had something >> > > to do with the master signaling that the snapshot was complete. >> > > We'll be sure to capture the master log next time. >> > > >> > > And thanks for also checking JIRA for me. If there is a bug here it >> > > seems as though we don't have an option to upgrade to fix it and >> > > we'll have to plan on coding around it for now. >> > > >> > > Thanks, >> > > >> > > --Jacob >> > > >> > > -----Original Message----- >> > > From: Ted Yu [mailto:[email protected]] >> > > Sent: Wednesday, May 24, 2017 8:47 AM >> > > To: [email protected] >> > > Subject: Re: Missing data in snapshot - possible flush timing issue? >> > > >> > > bq. the snapshot finishes before the flush of that last region >> > > finishes >> > > >> > > According to the last line in the pastebin, flush finished at >> > > 2017-05-12 >> > > 02:06:06,063 >> > > Did you find something in master log which indicated that snapshot >> > > finished before the above time ? >> > > >> > > I went thru snapshot bug fixes in branch-1.1 backward until seeing >> > > HBASE-15430 which was in 1.1.4 Among more recent snapshot fixes, >> > > there was none matching your description. >> > > >> > > Next time this happens to your cluster, can you capture master log >> > > (hopefully with DEBUG logging) ? We would get more clue from master >> log. >> > > >> > > Thanks >> > > >> > > On Wed, May 24, 2017 at 1:21 AM, LeBlanc, Jacob >> > > <[email protected]> >> > > wrote: >> > > >> > > > Hi all, >> > > > >> > > > We've recently been testing a Spark job that takes a snapshot of >> > > > an HBase table and then creates an RDD using >> > > > TableSnapshotInputFormat class, and we've run into an interesting >> > > > issue. It appears that the snapshot, and therefore resulting RDD, >> > > > was missing some data that was recently written to the table. >> > > > After some digging, it looks like the missing data was limited to >> > > > a particular region as if the memstore for the region failed to >> flush before the snapshot was taken. >> > > > >> > > > For our testing the job was scheduled in Oozie to run hourly and >> > > > we are missing data from the region for exactly the last hour, >> > > > indicating that when the job ran at 1:05 and took the snapshot >> > > > everything was flushed and worked fine, but at 2:05 when the job >> > > > ran we missed everything written to the region since the flush at >> 1:05. >> > > > >> > > > Here is an excerpt from the region server at that time: >> > > > >> > > > https://pastebin.com/1ECXjhRp >> > > > >> > > > Of the three regions for the table owned by this region server, >> > > > two of them have messages from "snapshot-pool81-thread-1" saying >> > > > it's flushing the memstore. But at "2017-05-12 02:06:03,229" >> > > > before the flush of region with rowkey starting with ";uP\xE1" >> > > > finishes, there is a message from "MemStoreFlusher.1" saying it's >> > > > starting the flush of memstore for region with rowkey starting >> > > > with "\xC0\x8F". Then it appears that the snapshot finishes before >> > > > the flush of that last region finishes, and that is the exact >> > > > region from which we are missing >> > > data. >> > > > >> > > > So it kind of looks like some timing issue where a normal >> > > > scheduled flush of the region's memstore kicked in on one thread >> > > > while the regions were being flushed for the snapshot by another >> > > > thread causing us to miss the flushed data for that one region in >> > > > the snapshot. Out of something like >> > > > 50-100 times this job was run we saw this occur twice. >> > > > >> > > > We are running HBase 1.1.4. Is this a known problem? I've tried >> > > > looking around JIRA but didn't find anything obvious. Anyone >> > > > familiar with the code could see how this might happen? >> > > > >> > > > We can code defensively against this, but wanted to understand >> > > > what we are dealing with. >> > > > >> > > > BTW, I don't have the logs from the master at that time as they've >> > > > been rolled since then. >> > > > >> > > > Thanks! >> > > > >> > > > --Jacob LeBlanc >> > > > >> > > >> > >> > >
