[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817104#comment-13817104
 ] 

Hudson commented on HBASE-9906:
---

SUCCESS: Integrated in HBase-0.94 #1196 (See 
[https://builds.apache.org/job/HBase-0.94/1196/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539910)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817380#comment-13817380
 ] 

Hudson commented on HBASE-9906:
---

SUCCESS: Integrated in hbase-0.96-hadoop2 #116 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/116/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539907)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817483#comment-13817483
 ] 

Hudson commented on HBASE-9906:
---

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #832 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/832/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539906)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816212#comment-13816212
 ] 

stack commented on HBASE-9906:
--

+1 Ugly but you call it out as so in comment on why.

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816788#comment-13816788
 ] 

Enis Soztutar commented on HBASE-9906:
--

bq. Can the 20 ms sleep start counting from the call to 
MetaEditor.deleteRegions() ?
I though about doing this inside RestoreSnapshotHelper, but encapsulating the 
whole thing in MetaEditor.overwriteRegions() seems cleaner, and other there 
might be other users for overwriting region data in meta. 
bq. Would 17ms sleep be good enough ?
Let's keep some buffer. 

Thanks for the reviews. 

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817001#comment-13817001
 ] 

Hudson commented on HBASE-9906:
---

SUCCESS: Integrated in hbase-0.96 #183 (See 
[https://builds.apache.org/job/hbase-0.96/183/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539907)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817028#comment-13817028
 ] 

Hudson commented on HBASE-9906:
---

SUCCESS: Integrated in HBase-TRUNK #4673 (See 
[https://builds.apache.org/job/HBase-TRUNK/4673/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539906)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817042#comment-13817042
 ] 

Hudson commented on HBASE-9906:
---

SUCCESS: Integrated in HBase-0.94-security #330 (See 
[https://builds.apache.org/job/HBase-0.94-security/330/])
HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: 
rev 1539910)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815418#comment-13815418
 ] 

Enis Soztutar commented on HBASE-9906:
--

We can fix this issue by: 
  - Fix either HBASE-9905 or HBASE-8770 or HBASE-9879
  - Add a sleep(20) between meta delete and update
  - obtain a ts from the client, and do the delete with that ts, and puts with 
ts+1 
  - change the meta delete to only delete columns not needed. The subsequent 
put will override the column values anyway. 


 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815463#comment-13815463
 ] 

Enis Soztutar commented on HBASE-9906:
--

Out of the above options, (1) will take some time to fix. (3) has another 
problem because we would be intermixing client-supplied timestamps and server 
supplied tss, which might cause further problems in meta, if clocks are out of 
sync. (4) is not ideal as well, since we want to delete the whole row, except 
for column info:regioninfo. For this we have to do a get for obtaining the 
columns for each row, and send deletes for each row. So that leaves us with 
option (2), which is embarrassing, but given that restore is very infrequent, 
that we can justify sleeping extra 20ms.  

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815475#comment-13815475
 ] 

Sergey Shelukhin commented on HBASE-9906:
-

You can use the power of out-of-order ts by doing puts first, getting the ts, 
and then doing deletes at that ts minus 1 :) although iirc meta might break 
because of that, because the key-before code optimizes by assuming no 
out-of-order ts across files.

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815492#comment-13815492
 ] 

Sergey Shelukhin commented on HBASE-9906:
-

Btw another option is uniqueTs. I am -0 on sleep...

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815510#comment-13815510
 ] 

Enis Soztutar commented on HBASE-9906:
--

Agreed that sleep is stupid, but without major surgery (uniqueTs, etc), and 
fixes to HBASE-9770, this seems to be the best option. [~mbertozzi], [~jmhsieh] 
mind taking a look? Thanks. 

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815518#comment-13815518
 ] 

Matteo Bertozzi commented on HBASE-9906:


if we don't have the ts fix, the sleep sounds ok to me

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815544#comment-13815544
 ] 

Hadoop QA commented on HBASE-9906:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612481/hbase-9906_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7765//console

This message is automatically generated.

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO 

[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815560#comment-13815560
 ] 

Enis Soztutar commented on HBASE-9906:
--

Thanks Matteo, test failure seems unrelated. 

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815632#comment-13815632
 ] 

Ted Yu commented on HBASE-9906:
---

Minor comment:
{code}
-  if (metaChanges.hasRegionsToRestore()) 
hrisToRemove.addAll(metaChanges.getRegionsToRestore());
   MetaEditor.deleteRegions(catalogTracker, hrisToRemove);
{code}
Can the 20 ms sleep start counting from the call to MetaEditor.deleteRegions() ?
Would 17ms sleep be good enough ?

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

2013-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815646#comment-13815646
 ] 

Hadoop QA commented on HBASE-9906:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12612509/hbase-9906-0.94_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7770//console

This message is automatically generated.

 Restore snapshot fails to restore the meta edits sporadically  
 ---

 Key: HBASE-9906
 URL: https://issues.apache.org/jira/browse/HBASE-9906
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch


 After snaphot restore, we see failures to find the table in meta:
 {code}
  disable 'tablefour'
  restore_snapshot 'snapshot_tablefour'
  enable 'tablefour'
 ERROR: Table tablefour does not exist.'
 {code}
 This is quite subtle. From the looks of it, we successfully restore the 
 snapshot, do the meta updates, return to the client about the status. The 
 client then tries to do an operation for the table (like enable table, or 
 scan in the test outputs) which fails because the meta entry for the region 
 seems to be gone (in case of single region, the table will be reported 
 missing). Subsequent attempts for creating the table will also fail because 
 the table directories will be there, but not the meta entries.
 For restoring meta entries, we are doing a delete then a put to the same 
 region:
 {code}
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
 76d0e2b7ec3291afcaa82e18a56ccc30
 2013-11-04 10:39:51,582 INFO 
 org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
 fa41edf43fe3ee131db4a34b848ff432
 ...
 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 
 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
 = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 
 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added 1
 {code}
 The root cause for this sporadic failure is that, the delete and subsequent 
 put will have the same timestamp if they execute in the same ms. The delete 
 will override the put in the same ts, even though the put have a larger ts.
 See: HBASE-9905, HBASE-8770
 Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)