[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817104#comment-13817104 ] Hudson commented on HBASE-9906: --- SUCCESS: Integrated in HBase-0.94 #1196 (See [https://builds.apache.org/job/HBase-0.94/1196/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539910) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817380#comment-13817380 ] Hudson commented on HBASE-9906: --- SUCCESS: Integrated in hbase-0.96-hadoop2 #116 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/116/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539907) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817483#comment-13817483 ] Hudson commented on HBASE-9906: --- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #832 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/832/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539906) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816212#comment-13816212 ] stack commented on HBASE-9906: -- +1 Ugly but you call it out as so in comment on why. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816788#comment-13816788 ] Enis Soztutar commented on HBASE-9906: -- bq. Can the 20 ms sleep start counting from the call to MetaEditor.deleteRegions() ? I though about doing this inside RestoreSnapshotHelper, but encapsulating the whole thing in MetaEditor.overwriteRegions() seems cleaner, and other there might be other users for overwriting region data in meta. bq. Would 17ms sleep be good enough ? Let's keep some buffer. Thanks for the reviews. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817001#comment-13817001 ] Hudson commented on HBASE-9906: --- SUCCESS: Integrated in hbase-0.96 #183 (See [https://builds.apache.org/job/hbase-0.96/183/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539907) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817028#comment-13817028 ] Hudson commented on HBASE-9906: --- SUCCESS: Integrated in HBase-TRUNK #4673 (See [https://builds.apache.org/job/HBase-TRUNK/4673/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539906) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817042#comment-13817042 ] Hudson commented on HBASE-9906: --- SUCCESS: Integrated in HBase-0.94-security #330 (See [https://builds.apache.org/job/HBase-0.94-security/330/]) HBASE-9906 Restore snapshot fails to restore the meta edits sporadically (enis: rev 1539910) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/snapshot/RestoreSnapshotHandler.java Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815418#comment-13815418 ] Enis Soztutar commented on HBASE-9906: -- We can fix this issue by: - Fix either HBASE-9905 or HBASE-8770 or HBASE-9879 - Add a sleep(20) between meta delete and update - obtain a ts from the client, and do the delete with that ts, and puts with ts+1 - change the meta delete to only delete columns not needed. The subsequent put will override the column values anyway. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815463#comment-13815463 ] Enis Soztutar commented on HBASE-9906: -- Out of the above options, (1) will take some time to fix. (3) has another problem because we would be intermixing client-supplied timestamps and server supplied tss, which might cause further problems in meta, if clocks are out of sync. (4) is not ideal as well, since we want to delete the whole row, except for column info:regioninfo. For this we have to do a get for obtaining the columns for each row, and send deletes for each row. So that leaves us with option (2), which is embarrassing, but given that restore is very infrequent, that we can justify sleeping extra 20ms. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815475#comment-13815475 ] Sergey Shelukhin commented on HBASE-9906: - You can use the power of out-of-order ts by doing puts first, getting the ts, and then doing deletes at that ts minus 1 :) although iirc meta might break because of that, because the key-before code optimizes by assuming no out-of-order ts across files. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815492#comment-13815492 ] Sergey Shelukhin commented on HBASE-9906: - Btw another option is uniqueTs. I am -0 on sleep... Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815510#comment-13815510 ] Enis Soztutar commented on HBASE-9906: -- Agreed that sleep is stupid, but without major surgery (uniqueTs, etc), and fixes to HBASE-9770, this seems to be the best option. [~mbertozzi], [~jmhsieh] mind taking a look? Thanks. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815518#comment-13815518 ] Matteo Bertozzi commented on HBASE-9906: if we don't have the ts fix, the sleep sounds ok to me Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815544#comment-13815544 ] Hadoop QA commented on HBASE-9906: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612481/hbase-9906_v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7765//console This message is automatically generated. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815560#comment-13815560 ] Enis Soztutar commented on HBASE-9906: -- Thanks Matteo, test failure seems unrelated. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815632#comment-13815632 ] Ted Yu commented on HBASE-9906: --- Minor comment: {code} - if (metaChanges.hasRegionsToRestore()) hrisToRemove.addAll(metaChanges.getRegionsToRestore()); MetaEditor.deleteRegions(catalogTracker, hrisToRemove); {code} Can the 20 ms sleep start counting from the call to MetaEditor.deleteRegions() ? Would 17ms sleep be good enough ? Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically
[ https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815646#comment-13815646 ] Hadoop QA commented on HBASE-9906: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612509/hbase-9906-0.94_v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7770//console This message is automatically generated. Restore snapshot fails to restore the meta edits sporadically --- Key: HBASE-9906 URL: https://issues.apache.org/jira/browse/HBASE-9906 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: hbase-9906-0.94_v1.patch, hbase-9906_v1.patch After snaphot restore, we see failures to find the table in meta: {code} disable 'tablefour' restore_snapshot 'snapshot_tablefour' enable 'tablefour' ERROR: Table tablefour does not exist.' {code} This is quite subtle. From the looks of it, we successfully restore the snapshot, do the meta updates, return to the client about the status. The client then tries to do an operation for the table (like enable table, or scan in the test outputs) which fails because the meta entry for the region seems to be gone (in case of single region, the table will be reported missing). Subsequent attempts for creating the table will also fail because the table directories will be there, but not the meta entries. For restoring meta entries, we are doing a delete then a put to the same region: {code} 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 76d0e2b7ec3291afcaa82e18a56ccc30 2013-11-04 10:39:51,582 INFO org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: fa41edf43fe3ee131db4a34b848ff432 ... 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED = fa41edf43fe3ee131db4a34b848ff432, NAME = 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY = '', ENDKEY = ''}, {ENCODED = 76d0e2b7ec3291afcaa82e18a56ccc30, NAME = 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 1 {code} The root cause for this sporadic failure is that, the delete and subsequent put will have the same timestamp if they execute in the same ms. The delete will override the put in the same ts, even though the put have a larger ts. See: HBASE-9905, HBASE-8770 Credit goes to [~huned] for reporting this bug. -- This message was sent by Atlassian JIRA (v6.1#6144)