[jira] [Created] (HBASE-24529) hbase.rs.evictblocksonclose is not honored when removing compacted files and closing the storefiles

2020-06-09 Thread Toshihiro Suzuki (Jira)
Toshihiro Suzuki created HBASE-24529:


 Summary: hbase.rs.evictblocksonclose is not honored when removing 
compacted files and closing the storefiles
 Key: HBASE-24529
 URL: https://issues.apache.org/jira/browse/HBASE-24529
 Project: HBase
  Issue Type: Bug
Reporter: Toshihiro Suzuki
Assignee: Toshihiro Suzuki


Currently, when removing compacted files and closing the storefiles, RS always 
does evict block caches for the store files. It should honor 
hbase.rs.evictblocksonclose:
https://github.com/apache/hbase/blob/7b396e9b8ca93361de6a6c4bc8a40442db77c4da/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L2744




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24517) AssignmentManager.start should add meta region to ServerStateNode

2020-06-09 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-24517:
---

Reopen for applying addendum.

> AssignmentManager.start should add meta region to ServerStateNode
> -
>
> Key: HBASE-24517
> URL: https://issues.apache.org/jira/browse/HBASE-24517
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6
>
>
> In AssignmentManager.start, we will load the meta region state and location 
> from zk and create the RegionStateNode, but we forget to call 
> regionStates.addRegionToServer to add the region to the region server.
> Found this when implementing HBASE-24390. As in HBASE-24390, we will remove 
> RegionInfoBuilder.FIRST_META_REGIONINFO so in SCP, we need to use the 
> getRegionsOnServer instead of RegionInfoBuilder.FIRST_META_REGIONINFO when 
> assigning meta, so the bug becomes a real problem.
> Though it is not a big problem for SCP for current 2.x and master branches, 
> it is a high risky bug. For example, in AssignmentManager.submitServerCrash, 
> now we use the RegionStateNode of meta regions to determine whether the given 
> region server carries meta regions. But it is also valid to test through the 
> ServerStateNode's region list. If later we change this method to use 
> ServerStateNode, it will cause very serious data loss bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24528) Improve balancer decision observability

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24528:
---

 Summary: Improve balancer decision observability
 Key: HBASE-24528
 URL: https://issues.apache.org/jira/browse/HBASE-24528
 Project: HBase
  Issue Type: New Feature
  Components: Admin, Balancer, shell, UI
Reporter: Andrew Kyle Purtell


We provide detailed INFO and DEBUG level logging of balancer decision factors, 
outcome, and reassignment planning, as well as similarly detailed logging of 
the resulting assignment manager activity. However, an operator may need to 
perform online and interactive observation, debugging, or performance analysis 
of current balancer activity. Scraping and correlating the many log lines 
resulting from a balancer execution is labor intensive and has a lot of latency 
(order of ~minutes to acquire and index, order of ~minutes to correlate). 

The balancer should maintain a rolling window of history, e.g. the last 100 
region move plans, or last 1000 region move plans submitted to the assignment 
manager. This history should include decision factor details and weights and 
costs. The rsgroups balancer may be able to provide fairly simple decision 
factors, like for example "this table was reassigned to that regionserver 
group". The underlying or vanilla stochastic balancer on the other hand, after 
a walk over random assignment plans, will have considered a number of cost 
functions with various inputs (locality, load, etc.) and multipliers, including 
custom cost functions. We can devise an extensible class structure that 
represents explanations for balancer decisions, and for each region move plan 
that is actually submitted to the assignment manager, we can keep the 
explanations of all relevant decision factors alongside the other details of 
the assignment plan like the region name, and the source and destination 
regionservers. 

This history should be available via API for use by new shell commands and 
admin UI widgets.

The new shell commands and UI widgets can unpack the representation of balancer 
decision components into human readable output. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24527) Improve region housekeeping status observability

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24527:
---

 Summary: Improve region housekeeping status observability
 Key: HBASE-24527
 URL: https://issues.apache.org/jira/browse/HBASE-24527
 Project: HBase
  Issue Type: New Feature
  Components: Admin, Compaction, shell, UI
Reporter: Andrew Kyle Purtell


We provide a coarse grained admin API and associated shell command for 
determining the compaction status of a table:

{noformat}
hbase(main):001:0> help "compaction_state"
Here is some help for this command:
 Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table:
 hbase> compaction_state 'ns1:t1'
 hbase> compaction_state 't1'
{noformat}

We also log  compaction activity, including a compaction journal at completion, 
via log4j to whatever log aggregation solution is available in production.  

This is not sufficient for online and interactive observation, debugging, or 
performance analysis of current compaction activity. In this kind of activity 
an operator is attempting to observe and analyze compaction activity in real 
time. Log aggregation and presentation solutions have typical latencies (end to 
end visibility of log lines on the order of ~minutes) which make that not 
possible today.

We don't offer any API or tools for directly interrogating split and merge 
activity in real time. Some indirect knowledge of split or merge activity can 
be inferred from RIT information via ClusterStatus. 

We should have new APIs and shell commands, and perhaps also new admin UI 
views, for

at regionserver scope:
* listing the current state of a regionserver's compaction, split, and merge 
tasks and threads
* counting (simple view) and listing (detailed view) a regionserver's 
compaction queues
* listing a region's currently compacting, splitting, or merging status

at master scope, aggregations of the above detailed information into:
* listing the active compaction tasks and threads for a given table, the 
extension of _compaction_state_ with a new detailed view
* listing the active split or merge tasks and threads for a given table's 
regions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Requesting doc changes review for branch-2 and branch-2.3

2020-06-09 Thread Nick Dimiduk
Heya,

I'm in the process of updating docs for branch-2.3. While I'm there, I
figure branch-2 should get a refresher as well. I'm tracking this work on
HBASE-24144. I'd appreciate the help of contributors who have landed
features and docs patches over the last couple years, help teasing out the
bits that are master exclusive.

Thanks,
Nick

https://issues.apache.org/jira/browse/HBASE-24144
https://github.com/apache/hbase/pull/1880


[jira] [Resolved] (HBASE-24005) Document maven invocation with JDK11

2020-06-09 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24005.
--
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> Document maven invocation with JDK11
> 
>
> Key: HBASE-24005
> URL: https://issues.apache.org/jira/browse/HBASE-24005
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0-alpha-1
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> This is not obvious at the moment. Add some docs to ease dev setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: How to delete feature branches from Gitbox

2020-06-09 Thread Nick Dimiduk
I tried deleting it from the GitHub UI this time.

On Mon, Jun 1, 2020 at 4:35 PM Sean Busbey  wrote:

> That should have worked. If we want to chase this down we can check the
> commits list to see if the delete was recorded.
>
> I'd guess someone else recreated the branch with a bare push.
>
> On Mon, Jun 1, 2020, 17:59 Nick Dimiduk  wrote:
>
> > git push origin :HBASE-24049-packaging-integration-hadoop-2.10.0
> >
> > where
> >
> > origin  https://gitbox.apache.org/repos/asf/hbase.git (fetch)
> >
> > On Mon, Jun 1, 2020 at 3:40 PM Sean Busbey  wrote:
> >
> > > how did you delete it?
> > >
> > > On Mon, Jun 1, 2020 at 4:13 PM Nick Dimiduk 
> wrote:
> > >
> > > > Heya,
> > > >
> > > > I deleted an old feature branch of mine, only to find Gitbox restored
> > it
> > > on
> > > > subsequent pull. Is there some procedure for deleting branches?
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > >   0 git … remote update --prune
> > > > Fetching origin
> > > > From https://gitbox.apache.org/repos/asf/hbase
> > > >  * [new branch]
> > > HBASE-24049-packaging-integration-hadoop-2.10.0
> > > > -> origin/HBASE-24049-packaging-integration-hadoop-2.10.0
> > > >
> > >
> >
>


[jira] [Created] (HBASE-24526) Deadlock executing assign meta procedure

2020-06-09 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24526:


 Summary: Deadlock executing assign meta procedure
 Key: HBASE-24526
 URL: https://issues.apache.org/jira/browse/HBASE-24526
 Project: HBase
  Issue Type: Bug
  Components: proc-v2, Region Assignment
Affects Versions: 2.3.0
Reporter: Nick Dimiduk


I have what appears to be a deadlock while assigning meta. During recovery, 
master creates the assign procedure for meta, and immediately marks meta as 
assigned in zookeeper. It then creates the subprocedure to open meta on the 
target region. However, the PEWorker pool is full of procedures that are stuck, 
I think because their calls to update meta are going nowhere. For what it's 
worth, the balancer is running concurrently, and has calculated a plan size of 
41.

>From the master log,

{noformat}
2020-06-06 00:34:07,314 INFO 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting 
pid=17802, ppid=17801, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; 
state=OPEN, location=null; forceNewPlan=true, retain=false
2020-06-06 00:34:07,465 INFO 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Setting hbase:meta 
(replicaId=0) location in ZooKeeper as 
hbasedn139.example.com,16020,1591403576247
2020-06-06 00:34:07,466 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized 
subprocedures=[{pid=17803, ppid=17802, state=RUNNABLE; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}

{{pid=17803 }} is not mentioned again. hbasedn139 never receives an 
{{openRegion}} RPC.

Meanwhile, additional procedures are scheduled and picked up by workers, each 
getting "stuck". I see log lines for all 16 PEWorker threads, saying that they 
are stuck.

{noformat}
2020-06-06 00:34:07,961 INFO 
org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: Took xlock 
for pid=17804, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; 
TransitRegionStateProcedure table=IntegrationTestBigLinkedList, 
region=54f4f6c0e921e6d25e6043cba79c09aa, REOPEN/MOVE
2020-06-06 00:34:07,961 INFO 
org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=17804 updating 
hbase:meta row=54f4f6c0e921e6d25e6043cba79c09aa, regionState=CLOSING, 
regionLocation=hbasedn046.example.com,16020,1591402383956
...
2020-06-06 00:34:22,295 WARN 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker stuck 
PEWorker-16(pid=17804), run time 14.3340 sec
...
2020-06-06 00:34:27,295 WARN 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker stuck 
PEWorker-16(pid=17804), run time 19.3340 sec
...
{noformat}

The cluster stays in this state, with PEWorker thread stuck for upwards of 15 
minutes. Eventually master starts logging

{noformat}
2020-06-06 00:50:18,033 INFO 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl: Call exception, tries=30, 
retries=31, started=970072 ms ago, cancelled=false, msg=Call queue is full on 
hbasedn139.example.com,16020,1591403576247, too many items queued ?, 
details=row 
'IntegrationTestBigLinkedList,,1591398987965.54f4f6c0e921e6d25e6043cba79c09aa.' 
on table 'hbase:meta' at region=hbase:meta,,1.
1588230740, hostname=hbasedn139.example.com,16020,1591403576247, seqNum=-1, see 
https://s.apache.org/timeout
{noformat}

The master never recovers on its own.

I'm not sure how common this condition might be. This popped after about 20 
total hours of running ITBLL with ServerKillingMonkey.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23927) hbck2 assigns command should accept a file containing a list of region names

2020-06-09 Thread Clara Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clara Xiong resolved HBASE-23927.
-
Resolution: Fixed

> hbck2 assigns command should accept a file containing a list of region names
> 
>
> Key: HBASE-23927
> URL: https://issues.apache.org/jira/browse/HBASE-23927
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck2, Operability, Usability
>Affects Versions: hbase-operator-tools-1.0.0
>Reporter: Nick Dimiduk
>Assignee: Clara Xiong
>Priority: Major
>
> The interface is not very ergonomic. Currently the command accepts a list of 
> region names on the command line. If you have 100's of regions to assign, 
> this sucks. We should accept a path to a file that contains these encoded 
> regions, one per line. That way, this command tails nicely into an operator's 
> incantation using grep/sed over log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24525) [branch-1] Support ZooKeeper 3.6.0+

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24525:
---

 Summary: [branch-1] Support ZooKeeper 3.6.0+
 Key: HBASE-24525
 URL: https://issues.apache.org/jira/browse/HBASE-24525
 Project: HBase
  Issue Type: Improvement
  Components: Zookeeper
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


Fix compilation issues against ZooKeeper 3.6.0. Backwards compatible changes 
with 3.4 and 3.5. Tested with:

{{  mvn clean install}}{{}}

{{  mvn clean install -Dzookeeper.version=3.5.8}}{{}}

{{  mvn clean install -Dzookeeper.version=3.6.0}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24524) SyncTable logging improvements

2020-06-09 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-24524:


 Summary: SyncTable logging improvements
 Key: HBASE-24524
 URL: https://issues.apache.org/jira/browse/HBASE-24524
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


While troubleshooting mismatches in replication deployment, SyncTable logging 
can provide some insights on what is diverging between two clusters. One 
caveat, though, is that it logs diverging row key as hexdecimal values, which 
is not so useful for operators trying to figure out which rows are mismatching, 
ideally, this info should be human readable, so that operators could have the 
exact value they could use for querying the tables with other tools, such as 
hbase shell.

Another issue is that while rows mismatches are logged as info, cell values 
mismatches are only logged as debug. In general, any of the mismatches would 
already be quite verbose, so ideally both should be logged in debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]HBase2.1.0 is slower than HBase1.2.0

2020-06-09 Thread Anoop John
Thanks for the detailed analysis and update zheng wang.
>The code line below in StoreScanner.next() cost about 100ms in v2.1, and
it added from v2.0, see HBASE-17647.
So still there is some additional cost in 2.1 right? Do u have any other
observation?  Are we doing more cell compares in 2.x?

Anoop


On Mon, Jun 8, 2020 at 1:50 AM zheng wang <18031...@qq.com> wrote:

> Hi guys:
>
>
> I did some test on my pc to find the reason as Jan Van Besien mentioned in
> user channel.
>
>
> #test env
> OS : win10
> JDK: 1.8
> MEM: 8GB
>
>
> #test data:
> 1 million rows with only one columnfamily and one qualifier.
>
>
> rowkey: rowkey-#index#
> value: value-#index#
>
>
> #test method:
> just use client api to scan with default config several times, no pe, no
> ycsb
>
>
> #test result(avg):
> v1.2.0: 800ms
> v2.1.0: 1050ms
>
>
> So, it is sure that v2.1 is slower than v1.2, after this, i did some
> statistics on regionserver.
> Then i find the partly reason is related to the size estimated.
>
>
> The code line below in StoreScanner.next() cost about 100ms in v2.1, and
> it added from v2.0, see HBASE-17647.
> "int cellSize = PrivateCellUtil.estimatedSerializedSizeOf(cell);"
>
>
> Should we support to disable the MaxResultSize limit(2MB by default now)
> to get more efficient if user exactly knows their data and could limit
> results only by setBatch and setLimit?


[jira] [Resolved] (HBASE-24510) Remove HBaseTestCase and GenericTestUtils

2020-06-09 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-24510.
---
Hadoop Flags: Incompatible change,Reviewed
Release Note: 
HBaseTestCase and GenericTestUtils have been removed.

Non of these classes are IA.Public, but maybe HBaseTestCase is used by users so 
still mark this as an incompatible change.
  Resolution: Fixed

> Remove HBaseTestCase and GenericTestUtils
> -
>
> Key: HBASE-24510
> URL: https://issues.apache.org/jira/browse/HBASE-24510
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> It is still a junit3 style test base, let's remove it.
> GenericTestUtils is also useless, remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24441) CacheConfig details logged at Store open is not really useful

2020-06-09 Thread Lijin Bin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijin Bin resolved HBASE-24441.
---
Resolution: Fixed

> CacheConfig details logged at Store open is not really useful
> -
>
> Key: HBASE-24441
> URL: https://issues.apache.org/jira/browse/HBASE-24441
> Project: HBase
>  Issue Type: Improvement
>  Components: logging, regionserver
>Affects Versions: 3.0.0-alpha-1
>Reporter: Anoop Sam John
>Assignee: song XinCun
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0
>
>
> CacheConfig constructor is logging 'this' object at INFO level. This log 
> comes during Store open(As CacheConfig instance for that store is created). 
> As the log is at CacheConfig only, we don't get to know this is for which 
> region:store. So not really useful log.
> {code}
> blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@7bc02941, 
> cacheDataOnRead=true, cacheDataOnWrite=true, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> {code}
> Also during every compaction also this logs keeps coming. This is because 
> during compaction we create new CacheConfig based on the HStore level 
> CacheConfig object.  We can avoid this log with every compaction happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24468) Add region info when log meessages in HStore.

2020-06-09 Thread Lijin Bin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijin Bin resolved HBASE-24468.
---
Resolution: Fixed

> Add region info when log meessages in HStore.
> -
>
> Key: HBASE-24468
> URL: https://issues.apache.org/jira/browse/HBASE-24468
> Project: HBase
>  Issue Type: Improvement
>  Components: logging, regionserver
>Affects Versions: 3.0.0-alpha-1
>Reporter: song XinCun
>Assignee: song XinCun
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0
>
>
> Some log message do not have region info when log, need to add it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24340) PerformanceEvaluation options should not mandate any specific order

2020-06-09 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24340.

Hadoop Flags: Reviewed
  Resolution: Fixed

> PerformanceEvaluation options should not mandate any specific order
> ---
>
> Key: HBASE-24340
> URL: https://issues.apache.org/jira/browse/HBASE-24340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Anoop Sam John
>Assignee: Sambit Mohapatra
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> During parsing of options, there are some validations.  One such is checking 
> whether autoFlush = false AND multiPut > 0.  This validation code mandates an 
> order that autoFlush=true should be specified before adding multiPut = x in 
> PE command.
> {code}
> final String multiPut = "--multiPut=";
>   if (cmd.startsWith(multiPut)) {
> opts.multiPut = Integer.parseInt(cmd.substring(multiPut.length()));
> if (!opts.autoFlush && opts.multiPut > 0) {
>   throw new IllegalArgumentException("autoFlush must be true when 
> multiPut is more than 0");
> }
> continue;
>   }
> {code}
> 'autoFlush ' default value is false. If multiPut is specified prior to 
> autoFlush in the PE command, we will end up throwing IllegalArgumentException.
> Checking other validations, seems not having such issue.  Still better to 
> move all the validations together into a private method and call that once 
> the parse is over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)