[jira] [Created] (HBASE-24529) hbase.rs.evictblocksonclose is not honored when removing compacted files and closing the storefiles
Toshihiro Suzuki created HBASE-24529: Summary: hbase.rs.evictblocksonclose is not honored when removing compacted files and closing the storefiles Key: HBASE-24529 URL: https://issues.apache.org/jira/browse/HBASE-24529 Project: HBase Issue Type: Bug Reporter: Toshihiro Suzuki Assignee: Toshihiro Suzuki Currently, when removing compacted files and closing the storefiles, RS always does evict block caches for the store files. It should honor hbase.rs.evictblocksonclose: https://github.com/apache/hbase/blob/7b396e9b8ca93361de6a6c4bc8a40442db77c4da/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L2744 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24517) AssignmentManager.start should add meta region to ServerStateNode
[ https://issues.apache.org/jira/browse/HBASE-24517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-24517: --- Reopen for applying addendum. > AssignmentManager.start should add meta region to ServerStateNode > - > > Key: HBASE-24517 > URL: https://issues.apache.org/jira/browse/HBASE-24517 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6 > > > In AssignmentManager.start, we will load the meta region state and location > from zk and create the RegionStateNode, but we forget to call > regionStates.addRegionToServer to add the region to the region server. > Found this when implementing HBASE-24390. As in HBASE-24390, we will remove > RegionInfoBuilder.FIRST_META_REGIONINFO so in SCP, we need to use the > getRegionsOnServer instead of RegionInfoBuilder.FIRST_META_REGIONINFO when > assigning meta, so the bug becomes a real problem. > Though it is not a big problem for SCP for current 2.x and master branches, > it is a high risky bug. For example, in AssignmentManager.submitServerCrash, > now we use the RegionStateNode of meta regions to determine whether the given > region server carries meta regions. But it is also valid to test through the > ServerStateNode's region list. If later we change this method to use > ServerStateNode, it will cause very serious data loss bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24528) Improve balancer decision observability
Andrew Kyle Purtell created HBASE-24528: --- Summary: Improve balancer decision observability Key: HBASE-24528 URL: https://issues.apache.org/jira/browse/HBASE-24528 Project: HBase Issue Type: New Feature Components: Admin, Balancer, shell, UI Reporter: Andrew Kyle Purtell We provide detailed INFO and DEBUG level logging of balancer decision factors, outcome, and reassignment planning, as well as similarly detailed logging of the resulting assignment manager activity. However, an operator may need to perform online and interactive observation, debugging, or performance analysis of current balancer activity. Scraping and correlating the many log lines resulting from a balancer execution is labor intensive and has a lot of latency (order of ~minutes to acquire and index, order of ~minutes to correlate). The balancer should maintain a rolling window of history, e.g. the last 100 region move plans, or last 1000 region move plans submitted to the assignment manager. This history should include decision factor details and weights and costs. The rsgroups balancer may be able to provide fairly simple decision factors, like for example "this table was reassigned to that regionserver group". The underlying or vanilla stochastic balancer on the other hand, after a walk over random assignment plans, will have considered a number of cost functions with various inputs (locality, load, etc.) and multipliers, including custom cost functions. We can devise an extensible class structure that represents explanations for balancer decisions, and for each region move plan that is actually submitted to the assignment manager, we can keep the explanations of all relevant decision factors alongside the other details of the assignment plan like the region name, and the source and destination regionservers. This history should be available via API for use by new shell commands and admin UI widgets. The new shell commands and UI widgets can unpack the representation of balancer decision components into human readable output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24527) Improve region housekeeping status observability
Andrew Kyle Purtell created HBASE-24527: --- Summary: Improve region housekeeping status observability Key: HBASE-24527 URL: https://issues.apache.org/jira/browse/HBASE-24527 Project: HBase Issue Type: New Feature Components: Admin, Compaction, shell, UI Reporter: Andrew Kyle Purtell We provide a coarse grained admin API and associated shell command for determining the compaction status of a table: {noformat} hbase(main):001:0> help "compaction_state" Here is some help for this command: Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table: hbase> compaction_state 'ns1:t1' hbase> compaction_state 't1' {noformat} We also log compaction activity, including a compaction journal at completion, via log4j to whatever log aggregation solution is available in production. This is not sufficient for online and interactive observation, debugging, or performance analysis of current compaction activity. In this kind of activity an operator is attempting to observe and analyze compaction activity in real time. Log aggregation and presentation solutions have typical latencies (end to end visibility of log lines on the order of ~minutes) which make that not possible today. We don't offer any API or tools for directly interrogating split and merge activity in real time. Some indirect knowledge of split or merge activity can be inferred from RIT information via ClusterStatus. We should have new APIs and shell commands, and perhaps also new admin UI views, for at regionserver scope: * listing the current state of a regionserver's compaction, split, and merge tasks and threads * counting (simple view) and listing (detailed view) a regionserver's compaction queues * listing a region's currently compacting, splitting, or merging status at master scope, aggregations of the above detailed information into: * listing the active compaction tasks and threads for a given table, the extension of _compaction_state_ with a new detailed view * listing the active split or merge tasks and threads for a given table's regions -- This message was sent by Atlassian Jira (v8.3.4#803005)
Requesting doc changes review for branch-2 and branch-2.3
Heya, I'm in the process of updating docs for branch-2.3. While I'm there, I figure branch-2 should get a refresher as well. I'm tracking this work on HBASE-24144. I'd appreciate the help of contributors who have landed features and docs patches over the last couple years, help teasing out the bits that are master exclusive. Thanks, Nick https://issues.apache.org/jira/browse/HBASE-24144 https://github.com/apache/hbase/pull/1880
[jira] [Resolved] (HBASE-24005) Document maven invocation with JDK11
[ https://issues.apache.org/jira/browse/HBASE-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-24005. -- Fix Version/s: 3.0.0-alpha-1 Resolution: Fixed > Document maven invocation with JDK11 > > > Key: HBASE-24005 > URL: https://issues.apache.org/jira/browse/HBASE-24005 > Project: HBase > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.0.0-alpha-1 >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Fix For: 3.0.0-alpha-1 > > > This is not obvious at the moment. Add some docs to ease dev setup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: How to delete feature branches from Gitbox
I tried deleting it from the GitHub UI this time. On Mon, Jun 1, 2020 at 4:35 PM Sean Busbey wrote: > That should have worked. If we want to chase this down we can check the > commits list to see if the delete was recorded. > > I'd guess someone else recreated the branch with a bare push. > > On Mon, Jun 1, 2020, 17:59 Nick Dimiduk wrote: > > > git push origin :HBASE-24049-packaging-integration-hadoop-2.10.0 > > > > where > > > > origin https://gitbox.apache.org/repos/asf/hbase.git (fetch) > > > > On Mon, Jun 1, 2020 at 3:40 PM Sean Busbey wrote: > > > > > how did you delete it? > > > > > > On Mon, Jun 1, 2020 at 4:13 PM Nick Dimiduk > wrote: > > > > > > > Heya, > > > > > > > > I deleted an old feature branch of mine, only to find Gitbox restored > > it > > > on > > > > subsequent pull. Is there some procedure for deleting branches? > > > > > > > > Thanks, > > > > Nick > > > > > > > > 0 git … remote update --prune > > > > Fetching origin > > > > From https://gitbox.apache.org/repos/asf/hbase > > > > * [new branch] > > > HBASE-24049-packaging-integration-hadoop-2.10.0 > > > > -> origin/HBASE-24049-packaging-integration-hadoop-2.10.0 > > > > > > > > > >
[jira] [Created] (HBASE-24526) Deadlock executing assign meta procedure
Nick Dimiduk created HBASE-24526: Summary: Deadlock executing assign meta procedure Key: HBASE-24526 URL: https://issues.apache.org/jira/browse/HBASE-24526 Project: HBase Issue Type: Bug Components: proc-v2, Region Assignment Affects Versions: 2.3.0 Reporter: Nick Dimiduk I have what appears to be a deadlock while assigning meta. During recovery, master creates the assign procedure for meta, and immediately marks meta as assigned in zookeeper. It then creates the subprocedure to open meta on the target region. However, the PEWorker pool is full of procedures that are stuck, I think because their calls to update meta are going nowhere. For what it's worth, the balancer is running concurrently, and has calculated a plan size of 41. >From the master log, {noformat} 2020-06-06 00:34:07,314 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=17802, ppid=17801, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; state=OPEN, location=null; forceNewPlan=true, retain=false 2020-06-06 00:34:07,465 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in ZooKeeper as hbasedn139.example.com,16020,1591403576247 2020-06-06 00:34:07,466 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=17803, ppid=17802, state=RUNNABLE; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] {noformat} {{pid=17803 }} is not mentioned again. hbasedn139 never receives an {{openRegion}} RPC. Meanwhile, additional procedures are scheduled and picked up by workers, each getting "stuck". I see log lines for all 16 PEWorker threads, saying that they are stuck. {noformat} 2020-06-06 00:34:07,961 INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: Took xlock for pid=17804, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=IntegrationTestBigLinkedList, region=54f4f6c0e921e6d25e6043cba79c09aa, REOPEN/MOVE 2020-06-06 00:34:07,961 INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=17804 updating hbase:meta row=54f4f6c0e921e6d25e6043cba79c09aa, regionState=CLOSING, regionLocation=hbasedn046.example.com,16020,1591402383956 ... 2020-06-06 00:34:22,295 WARN org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker stuck PEWorker-16(pid=17804), run time 14.3340 sec ... 2020-06-06 00:34:27,295 WARN org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker stuck PEWorker-16(pid=17804), run time 19.3340 sec ... {noformat} The cluster stays in this state, with PEWorker thread stuck for upwards of 15 minutes. Eventually master starts logging {noformat} 2020-06-06 00:50:18,033 INFO org.apache.hadoop.hbase.client.RpcRetryingCallerImpl: Call exception, tries=30, retries=31, started=970072 ms ago, cancelled=false, msg=Call queue is full on hbasedn139.example.com,16020,1591403576247, too many items queued ?, details=row 'IntegrationTestBigLinkedList,,1591398987965.54f4f6c0e921e6d25e6043cba79c09aa.' on table 'hbase:meta' at region=hbase:meta,,1. 1588230740, hostname=hbasedn139.example.com,16020,1591403576247, seqNum=-1, see https://s.apache.org/timeout {noformat} The master never recovers on its own. I'm not sure how common this condition might be. This popped after about 20 total hours of running ITBLL with ServerKillingMonkey. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23927) hbck2 assigns command should accept a file containing a list of region names
[ https://issues.apache.org/jira/browse/HBASE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong resolved HBASE-23927. - Resolution: Fixed > hbck2 assigns command should accept a file containing a list of region names > > > Key: HBASE-23927 > URL: https://issues.apache.org/jira/browse/HBASE-23927 > Project: HBase > Issue Type: Improvement > Components: hbck2, Operability, Usability >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Nick Dimiduk >Assignee: Clara Xiong >Priority: Major > > The interface is not very ergonomic. Currently the command accepts a list of > region names on the command line. If you have 100's of regions to assign, > this sucks. We should accept a path to a file that contains these encoded > regions, one per line. That way, this command tails nicely into an operator's > incantation using grep/sed over log files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24525) [branch-1] Support ZooKeeper 3.6.0+
Andrew Kyle Purtell created HBASE-24525: --- Summary: [branch-1] Support ZooKeeper 3.6.0+ Key: HBASE-24525 URL: https://issues.apache.org/jira/browse/HBASE-24525 Project: HBase Issue Type: Improvement Components: Zookeeper Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.7.0 Fix compilation issues against ZooKeeper 3.6.0. Backwards compatible changes with 3.4 and 3.5. Tested with: {{ mvn clean install}}{{}} {{ mvn clean install -Dzookeeper.version=3.5.8}}{{}} {{ mvn clean install -Dzookeeper.version=3.6.0}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24524) SyncTable logging improvements
Wellington Chevreuil created HBASE-24524: Summary: SyncTable logging improvements Key: HBASE-24524 URL: https://issues.apache.org/jira/browse/HBASE-24524 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil While troubleshooting mismatches in replication deployment, SyncTable logging can provide some insights on what is diverging between two clusters. One caveat, though, is that it logs diverging row key as hexdecimal values, which is not so useful for operators trying to figure out which rows are mismatching, ideally, this info should be human readable, so that operators could have the exact value they could use for querying the tables with other tools, such as hbase shell. Another issue is that while rows mismatches are logged as info, cell values mismatches are only logged as debug. In general, any of the mismatches would already be quite verbose, so ideally both should be logged in debug level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS]HBase2.1.0 is slower than HBase1.2.0
Thanks for the detailed analysis and update zheng wang. >The code line below in StoreScanner.next() cost about 100ms in v2.1, and it added from v2.0, see HBASE-17647. So still there is some additional cost in 2.1 right? Do u have any other observation? Are we doing more cell compares in 2.x? Anoop On Mon, Jun 8, 2020 at 1:50 AM zheng wang <18031...@qq.com> wrote: > Hi guys: > > > I did some test on my pc to find the reason as Jan Van Besien mentioned in > user channel. > > > #test env > OS : win10 > JDK: 1.8 > MEM: 8GB > > > #test data: > 1 million rows with only one columnfamily and one qualifier. > > > rowkey: rowkey-#index# > value: value-#index# > > > #test method: > just use client api to scan with default config several times, no pe, no > ycsb > > > #test result(avg): > v1.2.0: 800ms > v2.1.0: 1050ms > > > So, it is sure that v2.1 is slower than v1.2, after this, i did some > statistics on regionserver. > Then i find the partly reason is related to the size estimated. > > > The code line below in StoreScanner.next() cost about 100ms in v2.1, and > it added from v2.0, see HBASE-17647. > "int cellSize = PrivateCellUtil.estimatedSerializedSizeOf(cell);" > > > Should we support to disable the MaxResultSize limit(2MB by default now) > to get more efficient if user exactly knows their data and could limit > results only by setBatch and setLimit?
[jira] [Resolved] (HBASE-24510) Remove HBaseTestCase and GenericTestUtils
[ https://issues.apache.org/jira/browse/HBASE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-24510. --- Hadoop Flags: Incompatible change,Reviewed Release Note: HBaseTestCase and GenericTestUtils have been removed. Non of these classes are IA.Public, but maybe HBaseTestCase is used by users so still mark this as an incompatible change. Resolution: Fixed > Remove HBaseTestCase and GenericTestUtils > - > > Key: HBASE-24510 > URL: https://issues.apache.org/jira/browse/HBASE-24510 > Project: HBase > Issue Type: Task > Components: test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-alpha-1 > > > It is still a junit3 style test base, let's remove it. > GenericTestUtils is also useless, remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24441) CacheConfig details logged at Store open is not really useful
[ https://issues.apache.org/jira/browse/HBASE-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijin Bin resolved HBASE-24441. --- Resolution: Fixed > CacheConfig details logged at Store open is not really useful > - > > Key: HBASE-24441 > URL: https://issues.apache.org/jira/browse/HBASE-24441 > Project: HBase > Issue Type: Improvement > Components: logging, regionserver >Affects Versions: 3.0.0-alpha-1 >Reporter: Anoop Sam John >Assignee: song XinCun >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0 > > > CacheConfig constructor is logging 'this' object at INFO level. This log > comes during Store open(As CacheConfig instance for that store is created). > As the log is at CacheConfig only, we don't get to know this is for which > region:store. So not really useful log. > {code} > blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@7bc02941, > cacheDataOnRead=true, cacheDataOnWrite=true, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > {code} > Also during every compaction also this logs keeps coming. This is because > during compaction we create new CacheConfig based on the HStore level > CacheConfig object. We can avoid this log with every compaction happening. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24468) Add region info when log meessages in HStore.
[ https://issues.apache.org/jira/browse/HBASE-24468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijin Bin resolved HBASE-24468. --- Resolution: Fixed > Add region info when log meessages in HStore. > - > > Key: HBASE-24468 > URL: https://issues.apache.org/jira/browse/HBASE-24468 > Project: HBase > Issue Type: Improvement > Components: logging, regionserver >Affects Versions: 3.0.0-alpha-1 >Reporter: song XinCun >Assignee: song XinCun >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0 > > > Some log message do not have region info when log, need to add it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24340) PerformanceEvaluation options should not mandate any specific order
[ https://issues.apache.org/jira/browse/HBASE-24340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-24340. Hadoop Flags: Reviewed Resolution: Fixed > PerformanceEvaluation options should not mandate any specific order > --- > > Key: HBASE-24340 > URL: https://issues.apache.org/jira/browse/HBASE-24340 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Anoop Sam John >Assignee: Sambit Mohapatra >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > During parsing of options, there are some validations. One such is checking > whether autoFlush = false AND multiPut > 0. This validation code mandates an > order that autoFlush=true should be specified before adding multiPut = x in > PE command. > {code} > final String multiPut = "--multiPut="; > if (cmd.startsWith(multiPut)) { > opts.multiPut = Integer.parseInt(cmd.substring(multiPut.length())); > if (!opts.autoFlush && opts.multiPut > 0) { > throw new IllegalArgumentException("autoFlush must be true when > multiPut is more than 0"); > } > continue; > } > {code} > 'autoFlush ' default value is false. If multiPut is specified prior to > autoFlush in the PE command, we will end up throwing IllegalArgumentException. > Checking other validations, seems not having such issue. Still better to > move all the validations together into a private method and call that once > the parse is over. -- This message was sent by Atlassian Jira (v8.3.4#803005)