[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119446#comment-14119446
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #465 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/465/])
Revert HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu 
Shaohui) (apurtell: rev ee32706c5d93fb3de6f4aba09174d34ca3879f6d)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff, HBASE-11868-0.98-v2.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119502#comment-14119502
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98 #493 (See 
[https://builds.apache.org/job/HBase-0.98/493/])
HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu Shaohui) 
(apurtell: rev 39771b8f73a6e6eae12e8b3bdb7dd1fe13edc83c)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff, HBASE-11868-0.98-v2.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119664#comment-14119664
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #466 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/466/])
HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu Shaohui) 
(apurtell: rev 39771b8f73a6e6eae12e8b3bdb7dd1fe13edc83c)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff, HBASE-11868-0.98-v2.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119075#comment-14119075
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #463 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/463/])
HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu Shaohui) 
(apurtell: rev fd10bde5af20d6db96207cc2e29b779e117acf19)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119133#comment-14119133
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98 #489 (See 
[https://builds.apache.org/job/HBase-0.98/489/])
HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu Shaohui) 
(apurtell: rev fd10bde5af20d6db96207cc2e29b779e117acf19)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119209#comment-14119209
 ] 

Liu Shaohui commented on HBASE-11868:
-

[~apurtell]
Let me fix the failed tests.


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.7

 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119223#comment-14119223
 ] 

Andrew Purtell commented on HBASE-11868:


Hi [~lshmouse], if it's possible to do that in the next few hours this can make 
.6.

 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.7

 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119295#comment-14119295
 ] 

Liu Shaohui commented on HBASE-11868:
-

[~apurtell]
The reason is that the initialized failedTxid is 0. If there is no update in 
the test, the sync operation with txid = 0 with in test will failed for 
unflushedEntries is 0, which equals to failedTxid.

Change the  initialized failedTxid to -1 will fix the failed tests.



 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.7

 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119410#comment-14119410
 ] 

Hudson commented on HBASE-11868:


FAILURE: Integrated in HBase-0.98 #492 (See 
[https://builds.apache.org/job/HBase-0.98/492/])
Revert HBASE-11868 Data loss in hlog when the hdfs is unavailable (Liu 
Shaohui) (apurtell: rev ee32706c5d93fb3de6f4aba09174d34ca3879f6d)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Fix For: 0.98.6

 Attachments: HBASE-11868-0.98-v1.diff, HBASE-11868-0.98-v2.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid equals with syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore is flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-09-01 Thread Honghua Feng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117202#comment-14117202
 ] 

Honghua Feng commented on HBASE-11868:
--

+1

nice finding, thanks [~lshmouse] for the patch

 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid is less than syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore if flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-08-31 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116990#comment-14116990
 ] 

Liu Shaohui commented on HBASE-11868:
-

[~apurtell]
I think it is a critical bug in hlog. Would you want to fix this in 0.98.6?

 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker

 When using the new thread model in hbase, we found a bug which may cause data 
 loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid is less than syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore if flushed, the data will be 
 lost.
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

2014-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116995#comment-14116995
 ] 

Hadoop QA commented on HBASE-11868:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12665695/HBASE-11868-0.98-v1.diff
  against trunk revision .
  ATTACHMENT ID: 12665695

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10662//console

This message is automatically generated.

 Data loss in hlog when the hdfs is unavailable
 --

 Key: HBASE-11868
 URL: https://issues.apache.org/jira/browse/HBASE-11868
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Blocker
 Attachments: HBASE-11868-0.98-v1.diff


 When using the new thread model in hbase 0.98, we found a bug which may cause 
 data loss when the the hdfs is unavailable.
 When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
 first call appendNoSync to write the edits to hlog and then call sync with 
 txid. 
 Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
 is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
 AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
 will update the syncedTillHere to 10 and the failedTxid to 10.
 When the hlog calls the sync with txid :10, the failedTxid will nerver be 
 checked for txid is less than syncedTillHere.  The client thinks the write 
 success , but the data only be writtten to memstore not hlog. If the 
 regionserver is down later before the memstore if flushed, the data will be 
 lost.
 See: FSHLog.java #1348
 {code}
   // sync all transactions upto the specified txid
   private void syncer(long txid) throws IOException {
 synchronized (this.syncedTillHere) {
   while (this.syncedTillHere.get()  txid) {
 try {
   this.syncedTillHere.wait();
   if (txid = this.failedTxid.get()) {
 assert asyncIOE != null :
   current txid is among(under) failed txids, but asyncIOE is 
 null!;
 throw asyncIOE;
   }
 } catch (InterruptedException e) {
   LOG.debug(interrupted while waiting for notification from 
 AsyncNotifier);
 }
   }
 }
   }
 {code}
 We can fix this issue by moving the comparing of txid and failedTxid outside 
 the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)