[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2019-01-31 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757835#comment-16757835
 ] 

Sergey Shelukhin commented on HBASE-21601:
--

So I was able to repro and get a WAL, and then repro with it.
This actually looks like it's also a write-side problem, cause I'd assume no 
network issue would be able to produce a result like this.
WAL record structure for the corrupted record is intact.
The WALKey, as well as the count and the length of every single of the 476 
cells, of the corrupt record matches exactly the same of another, earlier 
record in the file, that has valid data. 
So the structure of the record is well-formed and that is why none of the 
IO/EOF exceptions happen.
However every one of the cells of the corrupted record itself has garbage data 
that seems to be pulled randomly from somewhere else, possibly elsewhere in the 
file. 

I suspect some buffers are being reused on retry, however I see no errors for 
this file in the logs of the RS that was writing it. RS did quit unexpectedly

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2019-01-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737564#comment-16737564
 ] 

Sergey Shelukhin commented on HBASE-21601:
--


Looks like we might need to look closer at the file... l cannot tell from 
KeyValueUtil/CellUtil/KeyValue/etc code where exactly the cell is created, but 
it seems like the requisite number of bytes should always be read for the 
record, assuming we don't get an IOException or EOF of some sort... or the 
lower level, byte-reading logic would throw an error.
So, we may be reading the record fully from the file, but getting some garbage 
bytes; or there's a bug somewhere that allows a partial read to happen, so the 
offset calculations in KeyValue/CellUtil return bogus offsets.

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2019-01-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737549#comment-16737549
 ] 

Sergey Shelukhin commented on HBASE-21601:
--

As far as I see, skipErrors is only applied to IOException-s from certain 
places, so these particular errors would not actually be caught by it. 


> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2018-12-27 Thread Bahram Chehrazy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729852#comment-16729852
 ] 

Bahram Chehrazy commented on HBASE-21601:
-

Looks like a duplicate of (https://issues.apache.org/jira/browse/HBASE-2958) 
which has never been resolved, perhaps because setting 
`hbase.hlog.split.skip.errors` to true is a viable workaround. 

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2018-12-21 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727174#comment-16727174
 ] 

Wei-Chiu Chuang commented on HBASE-21601:
-

A possible explanation for the NegativeArraySizeException could be that the 
byte array is more than 2GB in length: overflow.

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2018-12-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723324#comment-16723324
 ] 

Sergey Shelukhin commented on HBASE-21601:
--

Another one: {noformat}
java.lang.ArrayIndexOutOfBoundsException: -8965
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
at 
org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:786)
at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:777)
at 
org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:140)
at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:145)
at 
org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:298)
at 
org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
at 
org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Looks like there may need to be just a systematic review of that code to catch 
exceptions and handle corrupt records.

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

2018-12-13 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720835#comment-16720835
 ] 

Zheng Hu commented on HBASE-21601:
--

Could you find method to reproduce this bug ? If sure,  will be easy to fix... 
btw, I am trying to fix this issue: HBASE-21379

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> ---
>
> Key: HBASE-21601
> URL: https://issues.apache.org/jira/browse/HBASE-21601
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
>   at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>   at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family 
> length is negative. Not sure how such a cell got created, I suspect the file 
> was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)