[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-17 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128685#comment-13128685
 ] 

bluedavy commented on HBASE-4562:
-

the patch-0.90 is for 0.90.4...

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-17 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128690#comment-13128690
 ] 

bluedavy commented on HBASE-4562:
-

em,OK,I renamed the current patch for 0.90.4.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.4.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.4.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-17 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128713#comment-13128713
 ] 

bluedavy commented on HBASE-4562:
-

@Lars
I attached the patch for latest 0.90,pls apply it again  commit,thks.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.4.patch, HBASE-4562-0.90.patch, 
 HBASE-4562-0.92.patch, HBASE-4562-trunk.patch, test-4562-0.90.4.txt, 
 test-4562-0.90.txt, test-4562-0.92.txt, test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-17 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129416#comment-13129416
 ] 

bluedavy commented on HBASE-4562:
-

em,thks.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.4.patch, HBASE-4562-0.90.patch, 
 HBASE-4562-0.92.patch, HBASE-4562-trunk.patch, test-4562-0.90.4.txt, 
 test-4562-0.90.txt, test-4562-0.92.txt, test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read

2011-10-16 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128575#comment-13128575
 ] 

bluedavy commented on HBASE-4563:
-

I fix the formatter.

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128574#comment-13128574
 ] 

bluedavy commented on HBASE-4562:
-

I fix the comments to keep consistent in all patches.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-15 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128321#comment-13128321
 ] 

bluedavy commented on HBASE-4562:
-

I attached patches  test reports for 0.90.4,0.92 and trunk.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta occurs error,it'll cause data loss

2011-10-10 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123977#comment-13123977
 ] 

bluedavy commented on HBASE-4562:
-

@Ted Yu
OK, I attached the patch again,and also provide the test suite results.

 When split doing offlineParentInMeta occurs error,it'll cause data loss
 ---

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-test.report.txt, HBASE-4562.patch


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use below code:
 {code:title=SplitTransaction.java|borderStyle=solid}
   this.journal.add(JournalEntry.PONR); 
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
 {code} 
 {code:title=CompactSplitThread.java|borderStyle=solid}
   if (st.rollback(this.server, this.server)) {
   LOG.info(Successful rollback of failed split of  +
 parent.getRegionNameAsString());
   } 
   else {
   this.server.abort(Abort; we got an error after 
 point-of-no-return);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4563) When split doing this.parent.close(false) occurs error,it'll cause the splited region cann't write read

2011-10-10 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124626#comment-13124626
 ] 

bluedavy commented on HBASE-4563:
-

throw exception so CompactSplitThread can catch it then do rollback...

 When split doing this.parent.close(false) occurs error,it'll cause the 
 splited region cann't write  read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-test.report.txt, HBASE-4563.patch


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use below code:
 {code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = null;
   try{
   hstoreFilesToSplit = this.parent.close(false);
   }
   catch(IOException  e){
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
   throw e;
   }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4563) When split doing this.parent.close(false) occurs error,it'll cause the splited region cann't write read

2011-10-10 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124635#comment-13124635
 ] 

bluedavy commented on HBASE-4563:
-

@Ted Yu
Yes,:),I'll change it  attach the patch again.

 When split doing this.parent.close(false) occurs error,it'll cause the 
 splited region cann't write  read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-test.report.txt, HBASE-4563.patch


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use below code:
 {code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = null;
   try{
   hstoreFilesToSplit = this.parent.close(false);
   }
   catch(IOException  e){
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
   throw e;
   }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta occurs error,it'll cause data loss

2011-10-10 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124638#comment-13124638
 ] 

bluedavy commented on HBASE-4562:
-

@Ted Yu
I'll check the tests...
Also I'll attach the patch for 0.92  TRUNK.

 When split doing offlineParentInMeta occurs error,it'll cause data loss
 ---

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-test.report.txt, HBASE-4562.patch


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use below code:
 {code:title=SplitTransaction.java|borderStyle=solid}
   this.journal.add(JournalEntry.PONR); 
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
 {code} 
 {code:title=CompactSplitThread.java|borderStyle=solid}
   if (st.rollback(this.server, this.server)) {
   LOG.info(Successful rollback of failed split of  +
 parent.getRegionNameAsString());
   } 
   else {
   this.server.abort(Abort; we got an error after 
 point-of-no-return);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta occurs error,it'll cause data loss

2011-10-10 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124715#comment-13124715
 ] 

bluedavy commented on HBASE-4562:
-

@Ted Yu
I count all tests in src/test,I'm sure all test are ran...

 When split doing offlineParentInMeta occurs error,it'll cause data loss
 ---

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-test.report.txt, HBASE-4562.patch


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use below code:
 {code:title=SplitTransaction.java|borderStyle=solid}
   this.journal.add(JournalEntry.PONR); 
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
 {code} 
 {code:title=CompactSplitThread.java|borderStyle=solid}
   if (st.rollback(this.server, this.server)) {
   LOG.info(Successful rollback of failed split of  +
 parent.getRegionNameAsString());
   } 
   else {
   this.server.abort(Abort; we got an error after 
 point-of-no-return);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3872) Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it

2011-10-09 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123670#comment-13123670
 ] 

bluedavy commented on HBASE-3872:
-

@stack
current patch will cause data-loss in this situation:
1. change the SplitTransaction just like @mingjian said;
2. then create a table  put some data in hbase shell;
3. split the table in hbase shell;
4. kill the region server hosted the table;
5. after master do servershutdownhandler,then the table can be wrote again,but 
the data previous wrote to the table lost.

and in above code,if we don't kill the region server,then the parent region 
cann't be wrote,even if restart the cluster.

 Hole in split transaction rollback; edits to .META. need to be rolled back 
 even if it seems like they didn't make it
 

 Key: HBASE-3872
 URL: https://issues.apache.org/jira/browse/HBASE-3872
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.3
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.90.4

 Attachments: 3872-v2.txt, 3872.txt


 Saw this interesting one on a cluster of ours.  The cluster was configured 
 with too few handlers so lots of the phenomeneon where actions were queued 
 but then by the time they got into the server and tried respond to the 
 client, the client had disconnected because of the timeout of 60 seconds.  
 Well, the meta edits for a split were queued at the regionserver carrying 
 .META. and by the time it went to write back, the client had gone (the first 
 insert of parent offline with daughter regions added as info:splitA and 
 info:splitB).  The client presumed the edits failed and 'successfully' rolled 
 back the transaction (failing to undo .META. edits thinking they didn't go 
 through).
 A few minutes later the .META. scanner on master runs.  It sees 'no 
 references' in daughters -- the daughters had been cleaned up as part of the 
 split transaction rollback -- so it thinks its safe to delete the parent.
 Two things:
 + Tighten up check in master... need to check daughter region at least exists 
 and possibly the daughter region has an entry in .META.
 + Dependent on the edit that fails, schedule rollback edits though it will 
 seem like they didn't go through.
 This is pretty critical one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3872) Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it

2011-10-09 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123837#comment-13123837
 ] 

bluedavy commented on HBASE-3872:
-

We fix the bug using below code:
 if (!testing) {
+this.journal.add(JournalEntry.PONR);
 
MetaEditor.offlineParentInMeta(server.getCatalogTracker(),this.parent.getRegionInfo(),
a.getRegionInfo(), b.getRegionInfo());
}

-   this.journal.add(JournalEntry.PONR);

 Hole in split transaction rollback; edits to .META. need to be rolled back 
 even if it seems like they didn't make it
 

 Key: HBASE-3872
 URL: https://issues.apache.org/jira/browse/HBASE-3872
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.3
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.90.4

 Attachments: 3872-v2.txt, 3872.txt


 Saw this interesting one on a cluster of ours.  The cluster was configured 
 with too few handlers so lots of the phenomeneon where actions were queued 
 but then by the time they got into the server and tried respond to the 
 client, the client had disconnected because of the timeout of 60 seconds.  
 Well, the meta edits for a split were queued at the regionserver carrying 
 .META. and by the time it went to write back, the client had gone (the first 
 insert of parent offline with daughter regions added as info:splitA and 
 info:splitB).  The client presumed the edits failed and 'successfully' rolled 
 back the transaction (failing to undo .META. edits thinking they didn't go 
 through).
 A few minutes later the .META. scanner on master runs.  It sees 'no 
 references' in daughters -- the daughters had been cleaned up as part of the 
 split transaction rollback -- so it thinks its safe to delete the parent.
 Two things:
 + Tighten up check in master... need to check daughter region at least exists 
 and possibly the daughter region has an entry in .META.
 + Dependent on the edit that fails, schedule rollback edits though it will 
 seem like they didn't go through.
 This is pretty critical one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3872) Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it

2011-10-09 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123838#comment-13123838
 ] 

bluedavy commented on HBASE-3872:
-

{code:title=Bar.java|borderStyle=solid}
if (!testing) {
  this.journal.add(JournalEntry.PONR); 
  MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
}
// this.journal.add(JournalEntry.PONR); 
{code} 

 Hole in split transaction rollback; edits to .META. need to be rolled back 
 even if it seems like they didn't make it
 

 Key: HBASE-3872
 URL: https://issues.apache.org/jira/browse/HBASE-3872
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.3
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.90.4

 Attachments: 3872-v2.txt, 3872.txt


 Saw this interesting one on a cluster of ours.  The cluster was configured 
 with too few handlers so lots of the phenomeneon where actions were queued 
 but then by the time they got into the server and tried respond to the 
 client, the client had disconnected because of the timeout of 60 seconds.  
 Well, the meta edits for a split were queued at the regionserver carrying 
 .META. and by the time it went to write back, the client had gone (the first 
 insert of parent offline with daughter regions added as info:splitA and 
 info:splitB).  The client presumed the edits failed and 'successfully' rolled 
 back the transaction (failing to undo .META. edits thinking they didn't go 
 through).
 A few minutes later the .META. scanner on master runs.  It sees 'no 
 references' in daughters -- the daughters had been cleaned up as part of the 
 split transaction rollback -- so it thinks its safe to delete the parent.
 Two things:
 + Tighten up check in master... need to check daughter region at least exists 
 and possibly the daughter region has an entry in .META.
 + Dependent on the edit that fails, schedule rollback edits though it will 
 seem like they didn't go through.
 This is pretty critical one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3872) Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it

2011-10-09 Thread bluedavy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123861#comment-13123861
 ] 

bluedavy commented on HBASE-3872:
-

I created the HBASE-4562,HBASE-4563.

 Hole in split transaction rollback; edits to .META. need to be rolled back 
 even if it seems like they didn't make it
 

 Key: HBASE-3872
 URL: https://issues.apache.org/jira/browse/HBASE-3872
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.3
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.90.4

 Attachments: 3872-v2.txt, 3872.txt


 Saw this interesting one on a cluster of ours.  The cluster was configured 
 with too few handlers so lots of the phenomeneon where actions were queued 
 but then by the time they got into the server and tried respond to the 
 client, the client had disconnected because of the timeout of 60 seconds.  
 Well, the meta edits for a split were queued at the regionserver carrying 
 .META. and by the time it went to write back, the client had gone (the first 
 insert of parent offline with daughter regions added as info:splitA and 
 info:splitB).  The client presumed the edits failed and 'successfully' rolled 
 back the transaction (failing to undo .META. edits thinking they didn't go 
 through).
 A few minutes later the .META. scanner on master runs.  It sees 'no 
 references' in daughters -- the daughters had been cleaned up as part of the 
 split transaction rollback -- so it thinks its safe to delete the parent.
 Two things:
 + Tighten up check in master... need to check daughter region at least exists 
 and possibly the daughter region has an entry in .META.
 + Dependent on the edit that fails, schedule rollback edits though it will 
 seem like they didn't go through.
 This is pretty critical one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira