Mmm ok, how did you kill the master exactly? kill -9 or a normal shutdown? I think I could see how it would happen in the case of a normal shutdown, but even then it would *really really* help to see the logs of what's going on.
J-D On Tue, Oct 18, 2011 at 6:37 PM, Mingjian Deng <[email protected]> wrote: > @J-D: I used cloudrea CDH3. This loss can't replay every time but it could > happen with the following logs: > "2011-10-19 04:44:09,065 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes > of buffered edits, waiting for IO threads..." > This log printed many times and even 134218288 didn't change. I kill master > and restarted, the data loss. So I think the 134218288 bytes of entry was > the last entry in memory. In the following codes: > " synchronized (dataAvailable) { > totalBuffered += incrHeap; > while (totalBuffered > maxHeapUsage && (thrown == null || > thrown.get()== null)){ > LOG.debug("Used " + totalBuffered + " bytes of buffered edits, > waiting for IO threads..."); > dataAvailable.wait(3000); > } > dataAvailable.notifyAll(); > }" > If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs > dir, archiveLogs would excute even before writeThread end. > > 2011/10/19 Jean-Daniel Cryans <[email protected]> > > > Even if the files aren't closed properly, the fact that you are appending > > should persist them. > > > > Are you using a version of Hadoop that supports sync? > > > > Do you have logs that show the issue where the logs were moved but not > > written? > > > > Thx, > > > > J-D > > > > On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <[email protected]> > > wrote: > > > > > Hi: > > > There is a case cause data loss in our cluster. We block in splitLog > > > because some error in our hdfs and we kill master. Some hlog files were > > > moved from .logs to .oldlogs before them were wrote to > .recovered.edits. > > So > > > rs couldn't replay these files. > > > In HLogSplitter.java, we found: > > > ... > > > archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, > > conf); > > > } finally { > > > LOG.info("Finishing writing output logs and closing down."); > > > splits = outputSink.finishWritingAndClose(); > > > } > > > Why archiveLogs before outputSink.finishWritingAndClose()? Did these > > > hlog files mv to .oldlogs and couldn't be split next startup if write > > > threads failed but archiveLog success? > > > > > >
