[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-07-21 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551746#comment-16551746
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 7/21/18 3:52 PM:
---

Thank you for reviewing and pushing the patch. [~yuzhih...@gmail.com]


was (Author: brfrn169):
Thank you for reviewing and pushing the path. [~yuzhih...@gmail.com]

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 19893.master.004.patch, 19893.master.004.patch, 
> 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.005.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.006.patch, 
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas-output.txt
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-07-20 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551491#comment-16551491
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 7/21/18 2:00 AM:
---

Thank you for reviewing [~yuzhih...@gmail.com]. I'll fix the patch and attatch 
a new patch.

{quote}
BTW you used double negation in your above comment - probably not what you 
intended.
{quote}
Yeah, I wanted to say: "I don't think the test failure of 
TestDLSFSHLog.testRecoveredEdits() is related to the patch."





was (Author: brfrn169):
Thank you for reviewing [~yuzhih...@gmail.com]. I'll fix the patch and attache 
a new patch.

{quote}
BTW you used double negation in your above comment - probably not what you 
intended.
{quote}
Yeah, I wanted to say: "I don't think the test failure of 
TestDLSFSHLog.testRecoveredEdits() is related to the patch."




> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: 19893.master.004.patch, 19893.master.004.patch, 
> 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.005.patch, 
> HBASE-19893.master.005.patch, 
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas-output.txt
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-07-17 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546122#comment-16546122
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 7/17/18 7:13 AM:
---

One thing, according to the log, it looks like one of the RSs went down during 
splitting the region. It might be related to the snapshot failure.
{code}
2018-05-09 05:39:50,415 ERROR [RS:0;dbf4832ee95b:46047] 
helpers.MarkerIgnoringBase(159): * ABORTING region server 
dbf4832ee95b,46047,1525844373706: org.apache.hadoop.hbase.YouAreDeadException: 
Not onl
ine: 
testOnlineSnapshotAfterSplittingRegions-1525844378682,,1525844378727_0001.c46a65c48013581384e835d379d17e30.
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1061)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:983)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:463)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:15170)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Not 
online: 
testOnlineSnapshotAfterSplittingRegions-1525844378682,,1525844378727_0001.c46a65c48013581384e835d379d17e30.
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1028)
... 7 more
{code}


was (Author: brfrn169):
One thing, according to the log, it looks like one of the RSs went down during 
splitting the region. It might be related to the snapshot failure.

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: 19893.master.004.patch, 19893.master.004.patch, 
> 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.005.patch, 
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas-output.txt
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-05-09 Thread Toshihiro Suzuki (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468515#comment-16468515
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 5/9/18 7:46 AM:
--

{code}
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH } due to 
exception:Regions moved during the snapshot '{ ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH }'. 
expected=8 
snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
Regions moved during the snapshot '{ ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH }'. 
expected=8 snapshotted=7.
{code}
It looks like regions moved during the snapshot.

I wasn't able to reproduce the unit test failure in the last QA run locally. 
Attaching the patch again to rerun QA.


was (Author: brfrn169):
{code}
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH } due to 
exception:Regions moved during the snapshot '{ ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH }'. 
expected=8 
snapshotted=7.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
Regions moved during the snapshot '{ ss=snaptb1-1525844378682 
table=testOnlineSnapshotAfterSplittingRegions-1525844378682 type=FLUSH }'. 
expected=8 snapshotted=7.
{code}
It looks like regions moved during the snapshot.

I wan't able to reproduce the unit test failure in the last QA run locally. 
Attaching the patch again to rerun QA.

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-04-12 Thread Toshihiro Suzuki (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435206#comment-16435206
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 4/12/18 9:03 AM:
---

Thank you for letting me know [~nihaljain.cs]. If the attached patch fixes your 
issue, I don't think you need to raise another Jira.

Ping [~yuzhih...@gmail.com] [~ram_krish]. Could you please review the patch?

I reattached the latest patch to rerun a build.


was (Author: brfrn169):
Thanks for letting me know [~nihaljain.cs]. If the attached patch fixes your 
issue, I don't think you need to raise another Jira.

Ping [~yuzhih...@gmail.com] [~ram_krish]. Could you please review the patch?

I reattached the latest patch to rerun a build.

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-04-04 Thread Toshihiro Suzuki (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425254#comment-16425254
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 4/4/18 9:45 AM:
--

Sorry for the late reply [~ram_krish],

{quote}
So this process of restore snapshot procs adds the in memory info that the 
procedures has to the META. So when the table is enabled after restore 
snapshot, this META info is not taken as the source of truth is it? Ya i think 
we may not know whether after disabling and when we enable if the enable is 
from the snapshot or frm some where else. In that sense this fix LGTM.
So the change in TestRestoreSnapshotFromClient if run without the fix it would 
fail and now it would pass I believe.
{quote}
Yes, the META info should be the source of truth. Currently when restoring 
snapshot, the restore snapshot procs changes only META info and it doesn't 
change in-memory states. That's why this issue happens. The fix in the patch is 
adding a logic to change in-memory states.

{quote}
If the Master crashes and gets started again just after restore snapshot 
procedure is run and then you enable the table, what happens? Atleast that time 
do we read from META?
{quote}
Yes. I think even when Master crashes, Master can recover in-memory stats from 
the META table and retry restoring snapshot.


And I attached a v3 patch. In the previous patch, all region replica infos in 
in-memory stats were removed when restoring a snapshot.
However, I thought it is not correct and in the v3 patch, I think region 
replica infos in in-memory are handled correctly.

Could you please review this patch? [~yuzhih...@gmail.com] [~ram_krish]


was (Author: brfrn169):
Sorry for the late reply [~ram_krish],

{quote}
So this process of restore snapshot procs adds the in memory info that the 
procedures has to the META. So when the table is enabled after restore 
snapshot, this META info is not taken as the source of truth is it? Ya i think 
we may not know whether after disabling and when we enable if the enable is 
from the snapshot or frm some where else. In that sense this fix LGTM.
So the change in TestRestoreSnapshotFromClient if run without the fix it would 
fail and now it would pass I believe.
{quote}
Yes, the META info should be the source of truth.
Currently when restoring snapshot, the restore snapshot procs changes only META 
info and it doesn't change in-memory states.
That's why this issue happens.
The fix in the patch is adding a logic to change in-memory states.

{quote}
If the Master crashes and gets started again just after restore snapshot 
procedure is run and then you enable the table, what happens? Atleast that time 
do we read from META?
{quote}
Yes. I think even when Master crashes, Master can recover in-memory stats from 
the META table and retry restoring snapshot.


And I attached a v3 patch. In the previous patch, all region replica infos in 
in-memory stats were removed when restoring a snapshot.
However, I thought it is not correct and in the v3 patch, I think region 
replica infos in in-memory are handled correctly.

Could you please review this patch? [~yuzhih...@gmail.com] [~ram_krish]

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the 

[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-04-04 Thread Toshihiro Suzuki (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425254#comment-16425254
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 4/4/18 9:44 AM:
--

Sorry for the late reply [~ram_krish],

{quote}
So this process of restore snapshot procs adds the in memory info that the 
procedures has to the META. So when the table is enabled after restore 
snapshot, this META info is not taken as the source of truth is it? Ya i think 
we may not know whether after disabling and when we enable if the enable is 
from the snapshot or frm some where else. In that sense this fix LGTM.
So the change in TestRestoreSnapshotFromClient if run without the fix it would 
fail and now it would pass I believe.
{quote}
Yes, the META info should be the source of truth.
Currently when restoring snapshot, the restore snapshot procs changes only META 
info and it doesn't change in-memory states.
That's why this issue happens.
The fix in the patch is adding a logic to change in-memory states.

{quote}
If the Master crashes and gets started again just after restore snapshot 
procedure is run and then you enable the table, what happens? Atleast that time 
do we read from META?
{quote}
Yes. I think even when Master crashes, Master can recover in-memory stats from 
the META table and retry restoring snapshot.


And I attached a v3 patch. In the previous patch, all region replica infos in 
in-memory stats were removed when restoring a snapshot.
However, I thought it is not correct and in the v3 patch, I think region 
replica infos in in-memory are handled correctly.

Could you please review this patch? [~yuzhih...@gmail.com] [~ram_krish]


was (Author: brfrn169):
Sorry for the late reply [~ram_krish],

{quote}
So this process of restore snapshot procs adds the in memory info that the 
procedures has to the META. So when the table is enabled after restore 
snapshot, this META info is not taken as the source of truth is it? Ya i think 
we may not know whether after disabling and when we enable if the enable is 
from the snapshot or frm some where else. In that sense this fix LGTM.
So the change in TestRestoreSnapshotFromClient if run without the fix it would 
fail and now it would pass I believe.
{quote}
Yes, the META info should be the source of truth.
Currently when restoring snapshot, the restore snapshot procs changes only META 
info and it doesn't change in-memory states.
That's why this issue happens.
The fix in the patch is adding a logic to change in-memory states.

{quote}
If the Master crashes and gets started again just after restore snapshot 
procedure is run and then you enable the table, what happens? Atleast that time 
do we read from META?
{quote}
Yes. I think even when Master crashes, Master can recover in-memory stats from 
the META table and retry restoring snapshot.


And I attached a v3 patch. In the previous patch, all region replica infos in 
in-memory stats were removed when restoring a snapshot.
However, I thought it is not correct and in the v3 patch, region replica infos 
in in-memory are handled correctly.

Could you please review this patch? [~yuzhih...@gmail.com] [~ram_krish]

> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored 

[jira] [Comment Edited] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2018-03-05 Thread Toshihiro Suzuki (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385751#comment-16385751
 ] 

Toshihiro Suzuki edited comment on HBASE-19893 at 3/5/18 8:07 AM:
--

Thanks [~ram_krish].
{quote}
Why is that you need to add to the AM's in memory state from 
RestoreSnapshotProcedure? Once the snapshot is restored will the AM 
automatically read the META and do the assignments?
{quote}
As restore_snapshot is done offline, after finishing it, we need to enable the 
target table. When enabling the table, EnableTableProcedure gets regions of the 
table from AM and assigns them:
https://github.com/apache/hbase/blob/485af49e53cb38e2af4635f2c3bc0b33e15ba0a1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/EnableTableProcedure.java#L123-L125

Therefore, I thought we needed to add region infos to AM in 
RestoreSnapshotProcedure.




was (Author: brfrn169):
Thanks [~ram_krish].
{quote}
Why is that you need to add to the AM's in memory state from 
RestoreSnapshotProcedure? Once the snapshot is restored will the AM 
automatically read the META and do the assignments?
{quote}
As restore_snapshot is done offline, after finishing restore_snapshot, we need 
to enable the target table. When enabling the table, EnableTableProcedure gets 
regions of the table from AM and assign them:
https://github.com/apache/hbase/blob/485af49e53cb38e2af4635f2c3bc0b33e15ba0a1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/EnableTableProcedure.java#L123-L125

Therefore, I thought we needed to add region infos to AM in 
RestoreSnapshotProcedure.



> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)