[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.addendum.txt An addendum added post commit to branch-1 and master (Thanks [~enis]) Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: 13811.addendum.txt, 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolving. There is still dataloss going on but have to run at larger scales: ITBLL 2.5B in my test runs. Will open new issue to do the subsequent hole-plugging. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: 13811.addendum.txt, 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13811: -- Attachment: HBASE-13811-branch-1.1.patch [~enis] patch for 1.1. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v7.branch-1.txt Fix the failing unit tests. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13811: -- Attachment: startCacheFlush.diff Implement flushedSeqId calculation inside startCacheFlush. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v8.branch-1.txt Integrated your suggestion [~Apache9] of having the startCacheFlush do the sequence id calculation. I then went further and removed the last use of getEarliest for region (in close -- seemed like we were going long way around getting closed region sequence id) and then deprecated the method altogether; its operation is subtle and shouldn't be exposed as public method. Added tests for startCacheFlush's new operation. Let me try this on cluster. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v9.branch-1.txt Missed a change. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, startCacheFlush.diff I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v6.branch-1.txt Adjust test to suit new getEarliest for region workings. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v6.branch-1.txt Retry. Tests pass locally. Something seems to have gone wonky on builds. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v3.branch-1.txt Thanks [~Apache9] That helped. Thinking on it, I was a little confused on what is needed here. Rather than add a new method that does what the old getEarliestMemstoreSeqNum did, I changed getEarliestMemstoreSeqNum to be how the old version worked. My new version was incorrect taking into consideration sequenceids of ongoing flushes now we are doing per column-family flushes. getEarliestMemstoreSeqNum(regionname) is asking for the earliest 'region' sequenceid. It is called from two places, at flush time and at close. At flush time, there will be no sequenceid returned UNLESS we are flushing a subset of column families. In this case, we do not want to use the region flush sequence id but what comes out of getEarliestMemstoreSeqNum for the region (minus one); the region may have edits older than those being flushed in the current family. getEarliestMemstoreSeqNum(regionname, familyName) on the other hand is scoped to the column family so it needs to work on a different scale, on the column family scale without regard for oldest in the region. I did some trivial fixup to fix the checkstyle warning. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v3.branch-1.txt Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v2.branch-1.txt I just set it to the flush sequenceid. This is what it would be before this patch only we'd go through some machinations to get there; lets cut to the chase. I restored the test to do as it used to since it now passes. See what you think [~Apache9] (and thanks for helping on this stuff). On side note, have been testing on cluster and it has passed three runs; usually it would fail one. Trying at 10x scale to see what happens. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v4.branch-1.txt Fix legit NPEs Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.v5.branch-1.txt Upload right patch. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.branch-1.txt Refactored moving all to do with sequenceid accounting into own package protected class. Added then tests for the sequenceid accounting. Added [~Apache9] test to the patch too. [~Apache9] I changed TestGetLastFlushedSequenceId. The supposition that the region flushed id would be greater than the store flush id didn't make sense to me -- perhaps I am missing something. Made more sense that they would be equal after a flush. See what you think. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.txt Mostly logging changes so we output less but with more density including detail like sequence id at critical junctures so it easier debugging these issues going forward. Patch includes a change log. Fix is in FSHLog getEarliestMemstoreSeqNum methods; look in the Map of currently flushing sequence ids first and then if none found here, look in the oldest sequence id map. Trying this patch against hadoopqa to see if I've broke anything. Trying on a cluster. Need to add a test for this particular case still. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Reporter: stack Priority: Critical Attachments: 13811.txt I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Fix Version/s: 1.2.0 2.0.0 Assignee: stack Affects Version/s: 1.2.0 2.0.0 Status: Patch Available (was: Open) Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.txt I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.branch-1.txt Patch is actually for branch-1 at moment (that is what I am testing against). Add the branch-1 suffix. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.txt I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13811: -- Attachment: HBASE-13811.testcase.patch Write a testcase. It fails for me with {noformat} java.lang.AssertionError: actual array was null at org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss.test(TestSplitWalDataLoss.java:153) {noformat} I think the fix should also be applied to branch-1.1 since it will cause data loss even if we turn off flush per CF. [~ndimiduk] Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.txt, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13811: -- Attachment: HBASE-13811-v1.testcase.patch [~stack] Try this one? Just make a RegionServerReport manually instead of waiting the regionserver thread to do it for us. I modified getEarliestMemstoreSeqNum and it passed locally. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)