[jira] [Commented] (HBASE-23958) Balancer keeps balancing indefinitely

2020-05-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104093#comment-17104093
 ] 

ramkrishna.s.vasudevan commented on HBASE-23958:


Sorry my bad. That is not the case. 

> Balancer keeps balancing indefinitely 
> --
>
> Key: HBASE-23958
> URL: https://issues.apache.org/jira/browse/HBASE-23958
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.2
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.3.0
>
>
> Before raising this issue - am not sure if this got fixed directly or 
> indirectly in other latest versions of hbase.
> The steps are 
> 1) Create a cluster and create some tables.  (assume we have RS 1,2,3, 4 and 
> 5)
> 2) After the table creation and some ops done, the cluster was restarted. Due 
> to this some regions are in RIT. the RIT in  progress was to be assigned to 
> RS 3.
> 3) After the cluster comes back RS 3 and 4 are stopped.  (RS 3 will have 
> newer timestamp)
> 4) Now the master that comes up sees there are some RIT in place and tries to 
> load the entries to process the procedures again. As part of this the 
> RegionStateStore is populated with the old RS 3 hostname. (older timestamp). 
> This adds to the ServerStateNode creating a RS 3 with old timestamp as one 
> server.
> 5) Now after the master restarts and all regions assigned, the balancer 
> infinitely tries to balance the region to the RS 3 (old timestamp server) 
> thinking it is part of the cluster. 
> 6)the other problem is the MoveProcedure has the target as RS 3 (with old 
> timestamp) but the AM realizes that it is a down server and move it to the 
> one of the active server. But this is not recorded anywhere.
> I will continue to check the latest code if this case is valid. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23958) Balancer keeps balancing indefinitely

2020-05-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104092#comment-17104092
 ] 

ramkrishna.s.vasudevan commented on HBASE-23958:


[~ndimiduk]
Sure . 
But I also think it is indirectly related to 
https://issues.apache.org/jira/browse/HBASE-24189.


> Balancer keeps balancing indefinitely 
> --
>
> Key: HBASE-23958
> URL: https://issues.apache.org/jira/browse/HBASE-23958
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.2
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.3.0
>
>
> Before raising this issue - am not sure if this got fixed directly or 
> indirectly in other latest versions of hbase.
> The steps are 
> 1) Create a cluster and create some tables.  (assume we have RS 1,2,3, 4 and 
> 5)
> 2) After the table creation and some ops done, the cluster was restarted. Due 
> to this some regions are in RIT. the RIT in  progress was to be assigned to 
> RS 3.
> 3) After the cluster comes back RS 3 and 4 are stopped.  (RS 3 will have 
> newer timestamp)
> 4) Now the master that comes up sees there are some RIT in place and tries to 
> load the entries to process the procedures again. As part of this the 
> RegionStateStore is populated with the old RS 3 hostname. (older timestamp). 
> This adds to the ServerStateNode creating a RS 3 with old timestamp as one 
> server.
> 5) Now after the master restarts and all regions assigned, the balancer 
> infinitely tries to balance the region to the RS 3 (old timestamp server) 
> thinking it is part of the cluster. 
> 6)the other problem is the MoveProcedure has the target as RS 3 (with old 
> timestamp) but the AM realizes that it is a down server and move it to the 
> one of the active server. But this is not recorded anywhere.
> I will continue to check the latest code if this case is valid. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23958) Balancer keeps balancing indefinitely

2020-03-23 Thread Nick Dimiduk (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065192#comment-17065192
 ] 

Nick Dimiduk commented on HBASE-23958:
--

Also, what's the intended {{affectsVerison}} here?

> Balancer keeps balancing indefinitely 
> --
>
> Key: HBASE-23958
> URL: https://issues.apache.org/jira/browse/HBASE-23958
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.2
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.3.0
>
>
> Before raising this issue - am not sure if this got fixed directly or 
> indirectly in other latest versions of hbase.
> The steps are 
> 1) Create a cluster and create some tables.  (assume we have RS 1,2,3, 4 and 
> 5)
> 2) After the table creation and some ops done, the cluster was restarted. Due 
> to this some regions are in RIT. the RIT in  progress was to be assigned to 
> RS 3.
> 3) After the cluster comes back RS 3 and 4 are stopped.  (RS 3 will have 
> newer timestamp)
> 4) Now the master that comes up sees there are some RIT in place and tries to 
> load the entries to process the procedures again. As part of this the 
> RegionStateStore is populated with the old RS 3 hostname. (older timestamp). 
> This adds to the ServerStateNode creating a RS 3 with old timestamp as one 
> server.
> 5) Now after the master restarts and all regions assigned, the balancer 
> infinitely tries to balance the region to the RS 3 (old timestamp server) 
> thinking it is part of the cluster. 
> 6)the other problem is the MoveProcedure has the target as RS 3 (with old 
> timestamp) but the AM realizes that it is a down server and move it to the 
> one of the active server. But this is not recorded anywhere.
> I will continue to check the latest code if this case is valid. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23958) Balancer keeps balancing indefinitely

2020-03-23 Thread Nick Dimiduk (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065191#comment-17065191
 ] 

Nick Dimiduk commented on HBASE-23958:
--

[~ram_krish] give this a spin with the latest branch-2 or branch-2.3. 
HBASE-23984 fixes a minor accounting bug in RIT tracking in the master.

> Balancer keeps balancing indefinitely 
> --
>
> Key: HBASE-23958
> URL: https://issues.apache.org/jira/browse/HBASE-23958
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.2
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.3.0
>
>
> Before raising this issue - am not sure if this got fixed directly or 
> indirectly in other latest versions of hbase.
> The steps are 
> 1) Create a cluster and create some tables.  (assume we have RS 1,2,3, 4 and 
> 5)
> 2) After the table creation and some ops done, the cluster was restarted. Due 
> to this some regions are in RIT. the RIT in  progress was to be assigned to 
> RS 3.
> 3) After the cluster comes back RS 3 and 4 are stopped.  (RS 3 will have 
> newer timestamp)
> 4) Now the master that comes up sees there are some RIT in place and tries to 
> load the entries to process the procedures again. As part of this the 
> RegionStateStore is populated with the old RS 3 hostname. (older timestamp). 
> This adds to the ServerStateNode creating a RS 3 with old timestamp as one 
> server.
> 5) Now after the master restarts and all regions assigned, the balancer 
> infinitely tries to balance the region to the RS 3 (old timestamp server) 
> thinking it is part of the cluster. 
> 6)the other problem is the MoveProcedure has the target as RS 3 (with old 
> timestamp) but the AM realizes that it is a down server and move it to the 
> one of the active server. But this is not recorded anywhere.
> I will continue to check the latest code if this case is valid. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)