Re: Assigning regions after restart

2017-03-14 Thread Stack
File a blocker please Lars. I'm pretty sure the boolean on whether we are doing a recovery or not has been there a long time so yeah, a single server recovery could throw us off, but you make a practical point, that one server should not destroy locality over the cluster. St.Ack On Tue, Mar 14,

Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-14 Thread Yu Li
Let me check with @Enis in JIRA and get back to you (maybe days, due to schedule) later. Best Regards, Yu On 15 March 2017 at 05:57, jeff saremi wrote: > What's involved in getting this change merged into the main branch? These > 2 counters (fsReadLatency,

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Andrew Purtell
Great, and I changed my vote to -0 because Stack made a good argument that making more changes would invalidate review up to this point, and I trust this will be resolved before release. On Tue, Mar 14, 2017 at 4:29 PM, Josh Elser wrote: > Sorry Andrew, let me clarify as that

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Josh Elser
Sorry Andrew, let me clarify as that didn't come out right. I didn't mean that isn't a conversation worth having _now_, just that I was intentionally avoiding it in my previous email because I didn't understand the scope of those issues that Vlad had identified. I wanted to better understand

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Vladimir Rodionov
To be honest, Andrew, it is a blocker because I called it BLOCKER. By BLOCKER I meant - MUST be resolved by 2.0 RC1. How far are we from 2.0 RC1, by the way? I am pretty sure, that not only Phase 3 will be completed by that date, but even more advanced - Phase 4, with features like snapshot-less

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Andrew Purtell
> I'm going to intentional avoid addressing the discussion of shipping partial features (related, but not relevant at the moment). Then we are not having the same conversation, because it is precisely because this is a vote for this feature to go into 2.0, which is already overdue, so should be

Re: Need guidance on getting detailed elapsed times in every stage of processing a request

2017-03-14 Thread jeff saremi
What's involved in getting this change merged into the main branch? These 2 counters (fsReadLatency, fsWriteLatency) are super important to us understanding what goes on behind every request. These are the minimum we need to have especially in the absence of HTrace. I just checked the latest

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Vladimir Rodionov
>> How would a user recover from such a state User will need to run full backup for every table. No, we have not encountered this issue during testing, but, of course, it is possible, especially on a large cluster. -Vlad On Tue, Mar 14, 2017 at 2:36 PM, Josh Elser wrote: >

[jira] [Created] (HBASE-17788) Procedure V2 performance improvements

2017-03-14 Thread stack (JIRA)
stack created HBASE-17788: - Summary: Procedure V2 performance improvements Key: HBASE-17788 URL: https://issues.apache.org/jira/browse/HBASE-17788 Project: HBase Issue Type: Sub-task

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Josh Elser
Thanks for quick reply, Vlad! How would a user recover from such a state with the backup table in a broken state? Have you encountered such a scenario in your testing? The docs issue seems to be pretty minor too. I remember walking through the original patch (HBASE-16574) and it was pretty

[jira] [Resolved] (HBASE-17768) [C++] Makefile should recompile only the changed sources

2017-03-14 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-17768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-17768. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HBASE-14850 Thanks

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Vladimir Rodionov
Josh, On a doc side we have a very good manual: https://issues.apache.org/jira/secure/attachment/12829269/Backup-and-Restore-Apache_19Sep2016.pdf The only thing what was changed is the command-line tools args format, but you can start from there - just type command w/o args. In the current

Re: [VOTE] Backup/Restore feature for HBase 2.0, vote closing 3/11/2017

2017-03-14 Thread Josh Elser
I took a moment to read through the "blockers" as originally identified by Vlad, and (to echo Enis' take) I read the majority of them as being blockers not for the next release, but for a "full-fledged feature". I'm going to intentional avoid addressing the discussion of shipping partial

[jira] [Created] (HBASE-17787) Drop rollback for all procedures; not useful afterall

2017-03-14 Thread stack (JIRA)
stack created HBASE-17787: - Summary: Drop rollback for all procedures; not useful afterall Key: HBASE-17787 URL: https://issues.apache.org/jira/browse/HBASE-17787 Project: HBase Issue Type: Sub-task

[jira] [Created] (HBASE-17786) Create LoadBalancer perf-tests (test balancer algorithm decoupled from workload)

2017-03-14 Thread stack (JIRA)
stack created HBASE-17786: - Summary: Create LoadBalancer perf-tests (test balancer algorithm decoupled from workload) Key: HBASE-17786 URL: https://issues.apache.org/jira/browse/HBASE-17786 Project: HBase

[jira] [Created] (HBASE-17785) RSGroupBasedLoadBalancer has no concept of default group

2017-03-14 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-17785: -- Summary: RSGroupBasedLoadBalancer has no concept of default group Key: HBASE-17785 URL: https://issues.apache.org/jira/browse/HBASE-17785 Project: HBase

[jira] [Resolved] (HBASE-17781) TestAcidGuarantees is broken

2017-03-14 Thread Anoop Sam John (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-17781. Resolution: Invalid HBASE-17723 is not yet committed. So if locally applying that patch

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Wait, HBASE-15251 is not enough methinks. The checks added help, but are not covering all the possible edge cases. In particular, say a node really fails, why not just reassign the few regions it did hold and leave all the others where they are? Seems insane as it is. On Tue, Mar 14, 2017 at 2:24

[jira] [Resolved] (HBASE-14129) If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost

2017-03-14 Thread Lars George (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars George resolved HBASE-14129. - Resolution: Won't Fix Closing as "won't fix" as the hardcoded flag is too intrusive. The cluster

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Looking at the code more... it seems the issue is here In AssignmentManager.processDeadServersAndRegionsInTransition(): ... failoverCleanupDone(); if (!failover) { // Fresh cluster startup. LOG.info("Clean cluster startup. Assigning user regions"); assignAllUserRegions(allRegions); } ...

Assigning regions after restart

2017-03-14 Thread Lars George
Hi, I had this happened at multiple clusters recently where after the restart the locality dropped from close to or exactly 100% down to single digits. The reason is that all regions were completely shuffled and reassigned to random servers. Upon reading the (yet again non-trivial) assignment

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this (though I am not sure exactly how, see below). This should be backported to all 1.x branches! As for the patch, I see this if (!failover) { // Fresh cluster startup. - LOG.info("Clean cluster startup.

[jira] [Created] (HBASE-17784) Check if the regions are deployed in the correct group, and move it to the correct group when starting master

2017-03-14 Thread Guangxu Cheng (JIRA)
Guangxu Cheng created HBASE-17784: - Summary: Check if the regions are deployed in the correct group, and move it to the correct group when starting master Key: HBASE-17784 URL:

[jira] [Created] (HBASE-17783) Fix bugs that is still inconsistent after synchronizing zk and htable data at master startup

2017-03-14 Thread Guangxu Cheng (JIRA)
Guangxu Cheng created HBASE-17783: - Summary: Fix bugs that is still inconsistent after synchronizing zk and htable data at master startup Key: HBASE-17783 URL: https://issues.apache.org/jira/browse/HBASE-17783

[jira] [Created] (HBASE-17782) Introduce new configuration to decide reference type used in IdReadWriteLock

2017-03-14 Thread Yu Li (JIRA)
Yu Li created HBASE-17782: - Summary: Introduce new configuration to decide reference type used in IdReadWriteLock Key: HBASE-17782 URL: https://issues.apache.org/jira/browse/HBASE-17782 Project: HBase

[jira] [Created] (HBASE-17781) TestAcidGuarantees

2017-03-14 Thread Anup Halarnkar (JIRA)
Anup Halarnkar created HBASE-17781: -- Summary: TestAcidGuarantees Key: HBASE-17781 URL: https://issues.apache.org/jira/browse/HBASE-17781 Project: HBase Issue Type: Bug Components: