Even if the files aren't closed properly, the fact that you are appending should persist them.
Are you using a version of Hadoop that supports sync? Do you have logs that show the issue where the logs were moved but not written? Thx, J-D On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <[email protected]> wrote: > Hi: > There is a case cause data loss in our cluster. We block in splitLog > because some error in our hdfs and we kill master. Some hlog files were > moved from .logs to .oldlogs before them were wrote to .recovered.edits. So > rs couldn't replay these files. > In HLogSplitter.java, we found: > ... > archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); > } finally { > LOG.info("Finishing writing output logs and closing down."); > splits = outputSink.finishWritingAndClose(); > } > Why archiveLogs before outputSink.finishWritingAndClose()? Did these > hlog files mv to .oldlogs and couldn't be split next startup if write > threads failed but archiveLog success? >
