[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.addendum.txt

An addendum added post commit to branch-1 and master (Thanks [~enis])

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13811.addendum.txt, 13811.branch-1.txt, 
 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 
 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, 
 startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Resolving. There is still dataloss going on but have to run at larger scales: 
ITBLL 2.5B in my test runs. Will open new issue to do the subsequent 
hole-plugging.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13811.addendum.txt, 13811.branch-1.txt, 
 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 
 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, 
 startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13811:
--
Attachment: HBASE-13811-branch-1.1.patch

[~enis] patch for 1.1.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 
 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, 
 startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v7.branch-1.txt

Fix the failing unit tests.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-05 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13811:
--
Attachment: startCacheFlush.diff

Implement flushedSeqId calculation inside startCacheFlush.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch, startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v8.branch-1.txt

Integrated your suggestion [~Apache9] of having the startCacheFlush do the 
sequence id calculation.  I then went further and removed the last use of 
getEarliest for region (in close -- seemed like we were going long way around 
getting closed region sequence id) and then deprecated the method altogether; 
its operation is subtle and shouldn't be exposed as public method.

Added tests for startCacheFlush's new operation.

Let me try this on cluster.



 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, 
 startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v9.branch-1.txt

Missed a change.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 13811.v8.branch-1.txt, 
 13811.v9.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch, startCacheFlush.diff


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v6.branch-1.txt

Adjust test to suit new getEarliest for region workings.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v6.branch-1.txt

Retry. Tests  pass locally. Something seems to have gone wonky on builds.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
 13811.v6.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v3.branch-1.txt

Thanks [~Apache9] That helped. Thinking on it, I was a little confused on what 
is needed here.

Rather than add a new method that does what the old getEarliestMemstoreSeqNum 
did, I changed getEarliestMemstoreSeqNum to be how the old version worked. My 
new version was incorrect taking into consideration sequenceids of ongoing 
flushes now we are doing per column-family flushes.

getEarliestMemstoreSeqNum(regionname) is asking for the earliest 'region' 
sequenceid. It is called from two places, at flush time and at close. At flush 
time, there will be no sequenceid returned UNLESS we are flushing a subset of 
column families. In this case, we do not want to use the region flush sequence 
id but what comes out of getEarliestMemstoreSeqNum for the region (minus one); 
the region may have edits older than those being flushed in the current family.

getEarliestMemstoreSeqNum(regionname, familyName) on the other hand is scoped 
to the column family so it needs to work on a different scale, on the column 
family scale without regard for oldest in the region.

I did some trivial fixup to fix the checkstyle warning.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v3.branch-1.txt

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v2.branch-1.txt

I just set it to the flush sequenceid. This is what it would be before this 
patch only we'd go through some machinations to get there; lets cut to the 
chase. I restored the test to do as it used to since it now passes. See what 
you think [~Apache9] (and thanks for helping on this stuff).

On side note, have been testing on cluster and it has passed three runs; 
usually it would fail one. Trying at 10x scale to see what happens.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v4.branch-1.txt

Fix legit NPEs

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.v5.branch-1.txt

Upload right patch.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, HBASE-13811-v1.testcase.patch, 
 HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.branch-1.txt

Refactored moving all to do with sequenceid accounting into own package 
protected class. Added then tests for the sequenceid accounting.

Added [~Apache9] test to the patch too.

[~Apache9] I changed TestGetLastFlushedSequenceId. The supposition that the 
region flushed id would be greater than the store flush id didn't make sense to 
me -- perhaps I am missing something. Made more sense that they would be equal 
after a flush. See what you think.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.txt

Mostly logging changes so we output less but with more density including detail 
like sequence id at critical junctures so it easier debugging these issues 
going forward. Patch includes a change log.  Fix is in FSHLog 
getEarliestMemstoreSeqNum methods; look in the Map of currently flushing 
sequence ids first and then if none found here, look in the oldest sequence id 
map.

Trying this patch against hadoopqa to see if I've broke anything. Trying on a 
cluster. Need to add a test for this particular case still.


 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Priority: Critical
 Attachments: 13811.txt


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Fix Version/s: 1.2.0
   2.0.0
 Assignee: stack
Affects Version/s: 1.2.0
   2.0.0
   Status: Patch Available  (was: Open)

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.txt


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13811:
--
Attachment: 13811.branch-1.txt

Patch is actually for branch-1 at moment (that is what I am testing against). 
Add the branch-1 suffix.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.txt


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13811:
--
Attachment: HBASE-13811.testcase.patch

Write a testcase. It fails for me with
{noformat}
java.lang.AssertionError: actual array was null
at 
org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss.test(TestSplitWalDataLoss.java:153)
{noformat}

I think the fix should also be applied to branch-1.1 since it will cause data 
loss even if we turn off flush per CF. [~ndimiduk]

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.txt, HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS

2015-06-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13811:
--
Attachment: HBASE-13811-v1.testcase.patch

[~stack] Try this one? Just make a RegionServerReport manually instead of 
waiting the regionserver thread to do it for us.

 I modified getEarliestMemstoreSeqNum and it passed locally.

 Splitting WALs, we are filtering out too many edits - DATALOSS
 ---

 Key: HBASE-13811
 URL: https://issues.apache.org/jira/browse/HBASE-13811
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0

 Attachments: 13811.branch-1.txt, 13811.txt, 
 HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch


 I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
 ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
 patch for HBASE-13616 was in place so can only think it the cause (but cannot 
 see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)