[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757835#comment-16757835 ] Sergey Shelukhin commented on HBASE-21601: -- So I was able to repro and get a WAL, and then repro with it. This actually looks like it's also a write-side problem, cause I'd assume no network issue would be able to produce a result like this. WAL record structure for the corrupted record is intact. The WALKey, as well as the count and the length of every single of the 476 cells, of the corrupt record matches exactly the same of another, earlier record in the file, that has valid data. So the structure of the record is well-formed and that is why none of the IO/EOF exceptions happen. However every one of the cells of the corrupted record itself has garbage data that seems to be pulled randomly from somewhere else, possibly elsewhere in the file. I suspect some buffers are being reused on retry, however I see no errors for this file in the logs of the RS that was writing it. RS did quit unexpectedly > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737564#comment-16737564 ] Sergey Shelukhin commented on HBASE-21601: -- Looks like we might need to look closer at the file... l cannot tell from KeyValueUtil/CellUtil/KeyValue/etc code where exactly the cell is created, but it seems like the requisite number of bytes should always be read for the record, assuming we don't get an IOException or EOF of some sort... or the lower level, byte-reading logic would throw an error. So, we may be reading the record fully from the file, but getting some garbage bytes; or there's a bug somewhere that allows a partial read to happen, so the offset calculations in KeyValue/CellUtil return bogus offsets. > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737549#comment-16737549 ] Sergey Shelukhin commented on HBASE-21601: -- As far as I see, skipErrors is only applied to IOException-s from certain places, so these particular errors would not actually be caught by it. > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729852#comment-16729852 ] Bahram Chehrazy commented on HBASE-21601: - Looks like a duplicate of (https://issues.apache.org/jira/browse/HBASE-2958) which has never been resolved, perhaps because setting `hbase.hlog.split.skip.errors` to true is a viable workaround. > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727174#comment-16727174 ] Wei-Chiu Chuang commented on HBASE-21601: - A possible explanation for the NegativeArraySizeException could be that the byte array is more than 2GB in length: overflow. > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723324#comment-16723324 ] Sergey Shelukhin commented on HBASE-21601: -- Another one: {noformat} java.lang.ArrayIndexOutOfBoundsException: -8965 at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) at org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:786) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:777) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:140) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:145) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:298) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} Looks like there may need to be just a systematic review of that code to catch exceptions and handle corrupt records. > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)
[ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720835#comment-16720835 ] Zheng Hu commented on HBASE-21601: -- Could you find method to reproduce this bug ? If sure, will be easy to fix... btw, I am trying to fix this issue: HBASE-21379 > corrupted WAL is not handled in all places (NegativeArraySizeException) > --- > > Key: HBASE-21601 > URL: https://issues.apache.org/jira/browse/HBASE-21601 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Major > > {noformat} > 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] > executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY > java.lang.RuntimeException: java.lang.NegativeArraySizeException > at > org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846) > at > org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NegativeArraySizeException > at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047) > {noformat} > Unfortunately I cannot share the file. > The issue appears to be straightforward - for whatever reason the family > length is negative. Not sure how such a cell got created, I suspect the file > was corrupted. > {code} > byte[] output = new byte[cell.getFamilyLength()]; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)