@J-D: I used cloudrea CDH3. This loss can't replay every time but it could
happen with the following logs:
"2011-10-19 04:44:09,065 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes
of buffered edits, waiting for IO threads..."
This log printed many times and even 134218288 didn't change. I kill master
and restarted, the data loss. So I think the 134218288 bytes of entry was
the last entry in memory. In the following codes:
" synchronized (dataAvailable) {
totalBuffered += incrHeap;
while (totalBuffered > maxHeapUsage && (thrown == null ||
thrown.get()== null)){
LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
waiting for IO threads...");
dataAvailable.wait(3000);
}
dataAvailable.notifyAll();
}"
If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs
dir, archiveLogs would excute even before writeThread end.
2011/10/19 Jean-Daniel Cryans <[email protected]>
> Even if the files aren't closed properly, the fact that you are appending
> should persist them.
>
> Are you using a version of Hadoop that supports sync?
>
> Do you have logs that show the issue where the logs were moved but not
> written?
>
> Thx,
>
> J-D
>
> On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <[email protected]>
> wrote:
>
> > Hi:
> > There is a case cause data loss in our cluster. We block in splitLog
> > because some error in our hdfs and we kill master. Some hlog files were
> > moved from .logs to .oldlogs before them were wrote to .recovered.edits.
> So
> > rs couldn't replay these files.
> > In HLogSplitter.java, we found:
> > ...
> > archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs,
> conf);
> > } finally {
> > LOG.info("Finishing writing output logs and closing down.");
> > splits = outputSink.finishWritingAndClose();
> > }
> > Why archiveLogs before outputSink.finishWritingAndClose()? Did these
> > hlog files mv to .oldlogs and couldn't be split next startup if write
> > threads failed but archiveLog success?
> >
>