[jira] [Commented] (HBASE-20700) Move meta region when server crash can cause the procedure to be stuck
[ https://issues.apache.org/jira/browse/HBASE-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507731#comment-16507731 ] Duo Zhang commented on HBASE-20700: --- Thanks sir. Let me commit to master and branch-2 first. 2.0 needs a separated patch as it does not have a peer queue. > Move meta region when server crash can cause the procedure to be stuck > -- > > Key: HBASE-20700 > URL: https://issues.apache.org/jira/browse/HBASE-20700 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2, Region Assignment >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.1.0, 2.0.1 > > Attachments: HBASE-20700-UT.patch, HBASE-20700-v1.patch, > HBASE-20700-v2.patch, HBASE-20700.patch > > > As said in HBASE-20682. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20331) clean up shaded packaging for 2.1
[ https://issues.apache.org/jira/browse/HBASE-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507709#comment-16507709 ] Hudson commented on HBASE-20331: Results for branch HBASE-20331 [build #39 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/39/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/39//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/39//console]. (x) {color:red}-1 jdk8 hadoop3 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/39//console]. (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 3 using Hadoop's shaded client. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/39//artifact/output-integration/hadoop-3-shaded.log]. > clean up shaded packaging for 2.1 > - > > Key: HBASE-20331 > URL: https://issues.apache.org/jira/browse/HBASE-20331 > Project: HBase > Issue Type: Umbrella > Components: Client, mapreduce, shading >Affects Versions: 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 3.0.0, 2.1.0 > > > polishing pass on shaded modules for 2.0 based on trying to use them in more > contexts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507700#comment-16507700 ] Guanghao Zhang commented on HBASE-20697: bq. call RegionLocations.size() and it return 1 and seems only hold the last regionLocation of the list. It confused me. The default replica is 1 so this works. But the right way should use a map and the value is a list of region replica of same region. > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18999) Put in hbase shell cannot do multiple columns
[ https://issues.apache.org/jira/browse/HBASE-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507696#comment-16507696 ] Nihal Jain commented on HBASE-18999: Thanks for the review [~mdrob]. I will try to attach a new patch addressing your comments later this weekend. {quote}you have it switched in one of the examples in the help messages {quote} Yeah you are right. I will fix that along with the following already checked-in line. {code:java} hbase> deleteall 't1', {ROWPREFIXFILTER => 'prefix'}, 'c1'//delete certain column family in the row ranges {code} > Put in hbase shell cannot do multiple columns > - > > Key: HBASE-18999 > URL: https://issues.apache.org/jira/browse/HBASE-18999 > Project: HBase > Issue Type: Improvement > Components: shell >Affects Versions: 1.0.0, 3.0.0, 2.0.0 >Reporter: Mike Drob >Assignee: Nihal Jain >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-18999.master.001.patch > > > A {{Put}} can carry multiple cells, but doing so in the shell is very > difficult to construct. We should make this easier. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20662) Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement
[ https://issues.apache.org/jira/browse/HBASE-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507686#comment-16507686 ] Nihal Jain commented on HBASE-20662: compile and javac are red because i have 3 tests marked with @Ignore. I think I will have to remove them in a new patch and add them later with the Jira which will fix these scenarios. Otherwise all newly added tests have passed. Ping. [~elserj], [~yuzhih...@gmail.com], [~gsbiju] > Increasing space quota on a violated table does not remove > SpaceViolationPolicy.DISABLE enforcement > --- > > Key: HBASE-20662 > URL: https://issues.apache.org/jira/browse/HBASE-20662 > Project: HBase > Issue Type: Bug >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20662.master.001.patch, > HBASE-20662.master.002.patch > > > *Steps to reproduce* > * Create a table and set quota with {{SpaceViolationPolicy.DISABLE}} having > limit say 2MB > * Now put rows until space quota is violated and table gets disabled > * Next, increase space quota with limit say 4MB on the table > * Now try putting a row into the table > {code:java} > private void testSetQuotaThenViolateAndFinallyIncreaseQuota() throws > Exception { > SpaceViolationPolicy policy = SpaceViolationPolicy.DISABLE; > Put put = new Put(Bytes.toBytes("to_reject")); > put.addColumn(Bytes.toBytes(SpaceQuotaHelperForTests.F1), > Bytes.toBytes("to"), > Bytes.toBytes("reject")); > // Do puts until we violate space policy > final TableName tn = writeUntilViolationAndVerifyViolation(policy, put); > // Now, increase limit > setQuotaLimit(tn, policy, 4L); > // Put some row now: should not violate as quota limit increased > verifyNoViolation(policy, tn, put); > } > {code} > *Expected* > We should be able to put data as long as newly set quota limit is not reached > *Actual* > We fail to put any new row even after increasing limit > *Root cause* > Increasing quota on a violated table triggers the table to be enabled, but > since the table is already in violation, the system does not allow it to be > enabled (may be thinking that a user is trying to enable it) > *Relevant exception trace* > {noformat} > 2018-05-31 00:34:27,563 INFO [regionserver/root1-ThinkPad-T440p:0.Chore.1] > client.HBaseAdmin$14(844): Started enable of > testSetQuotaAndThenIncreaseQuotaWithDisable0 > 2018-05-31 00:34:27,571 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=42525] > ipc.CallRunner(142): callId: 11 service: MasterService methodName: > EnableTable size: 104 connection: 127.0.0.1:38030 deadline: 1527707127568, > exception=org.apache.hadoop.hbase.security.AccessDeniedException: Enabling > the table 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to > a violated space quota. > 2018-05-31 00:34:27,571 ERROR [regionserver/root1-ThinkPad-T440p:0.Chore.1] > quotas.RegionServerSpaceQuotaManager(210): Failed to disable space violation > policy for testSetQuotaAndThenIncreaseQuotaWithDisable0. This table will > remain in violation. > org.apache.hadoop.hbase.security.AccessDeniedException: > org.apache.hadoop.hbase.security.AccessDeniedException: Enabling the table > 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to a > violated space quota. > at org.apache.hadoop.hbase.master.HMaster$6.run(HMaster.java:2275) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131) > at org.apache.hadoop.hbase.master.HMaster.enableTable(HMaster.java:2258) > at > org.apache.hadoop.hbase.master.MasterRpcServices.enableTable(MasterRpcServices.java:725) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) > at > org.apache.hadoop.hbase.ipc.
[jira] [Comment Edited] (HBASE-20704) Sometimes some compacted storefiles are not archived on region close
[ https://issues.apache.org/jira/browse/HBASE-20704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507680#comment-16507680 ] Francis Liu edited comment on HBASE-20704 at 6/11/18 5:03 AM: -- {quote}Are we ensuring this always after this patch? What if the RS going down in between a close so not all the compacted files are archived? This issue is there with old impl also right (Archive immediately and not by the Discharger thread) {quote} In both cases if the RS is not gracefully shutdown then the WAL will get replayed and the compaction marker gets replayed thus making sure the compacted files are never accessed. Or so I'd like to confidently say. But it seems that even that part has a bug wherein the WAL containing the compaction marker thats needs to be replayed can get archived as sequence id tracking for WAL is only tied to memstore flushed, ignoring wether compaction archival for a given compaction even has completed. The same can be said for when edits are replayed on region open. I can think of a few reasons why this was not observed (or not as much) during pre-discharger versions. 1. Since we archive soon after compacting the window for exposure is pretty small. 2. At least for the delete case assuming the common case that the user does not mess with the timestamps. Since the compacted storefiles are sorted by seq id and removed in sequence, the storefiles containing rows that were deleted are removed first before the storefiles containing the corresponding tombstones for those rows. With the discharger we skip storefiles if they still have references. So to sum things up with this other bug when a server aborts there's a possibility some compacted storefiles (the ones not removed) can reopened by the failover RS. Should we address this issue here? Or create another jira? If another Jira then in this one we can prolly add a partial fix wherein the discharger only removes contiguous storefiles? was (Author: toffer): {quote} Are we ensuring this always after this patch? What if the RS going down in between a close so not all the compacted files are archived? This issue is there with old impl also right (Archive immediately and not by the Discharger thread) {quote} In both cases if the RS is not gracefully shutdown then the WAL will get replayed and the compaction marker gets replayed thus making sure the compacted files are never accessed. Or so I'd like to confidently say. But it seems that even that part has a bug wherein the WAL containing the compaction marker thats needs to be replayed can get archived as sequence id tracking for WAL is only tied to memstore flushed, ignoring wether compaction archival for a given compaction even has completed. The same can be said for when edits are replayed on region open. I can think of a few reasons why this was not observed (or not as much) during pre-discharger versions. 1. Since we archive soon after compacting the window for exposure is pretty small. 2. At least for the delete case assuming the common case that the user does not mess with the timestamps. Since the compacted storefiles are sorted by seq id and removed in sequence, the storefiles containing rows that were deleted are removed first before the storefiles containing the corresponding tombstones for those rows. With the discharger we skip storefiles if they still have references. Should we address this issue here? Or create another jira? If another Jira then in this one we can prolly add a partial fix wherein the discharger only removes contiguous storefiles? > Sometimes some compacted storefiles are not archived on region close > > > Key: HBASE-20704 > URL: https://issues.apache.org/jira/browse/HBASE-20704 > Project: HBase > Issue Type: Bug > Components: Compaction >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: Francis Liu >Assignee: Francis Liu >Priority: Critical > Attachments: HBASE-20704.001.patch, HBASE-20704.002.patch > > > During region close compacted files which have not yet been archived by the > discharger are archived as part of the region closing process. It is > important that these files are wholly archived to insure data consistency. ie > a storefile containing delete tombstones can be archived while older > storefiles containing cells that were supposed to be deleted are left > unarchived thereby undeleting those cells. > On region close a compacted storefile is skipped from archiving if it has > read references (ie open scanners). This behavior is correct for when the > discharger chore runs but on region close consistency is of course more > important so we should add a special case to ignore
[jira] [Commented] (HBASE-20704) Sometimes some compacted storefiles are not archived on region close
[ https://issues.apache.org/jira/browse/HBASE-20704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507680#comment-16507680 ] Francis Liu commented on HBASE-20704: - {quote} Are we ensuring this always after this patch? What if the RS going down in between a close so not all the compacted files are archived? This issue is there with old impl also right (Archive immediately and not by the Discharger thread) {quote} In both cases if the RS is not gracefully shutdown then the WAL will get replayed and the compaction marker gets replayed thus making sure the compacted files are never accessed. Or so I'd like to confidently say. But it seems that even that part has a bug wherein the WAL containing the compaction marker thats needs to be replayed can get archived as sequence id tracking for WAL is only tied to memstore flushed, ignoring wether compaction archival for a given compaction even has completed. The same can be said for when edits are replayed on region open. I can think of a few reasons why this was not observed (or not as much) during pre-discharger versions. 1. Since we archive soon after compacting the window for exposure is pretty small. 2. At least for the delete case assuming the common case that the user does not mess with the timestamps. Since the compacted storefiles are sorted by seq id and removed in sequence, the storefiles containing rows that were deleted are removed first before the storefiles containing the corresponding tombstones for those rows. With the discharger we skip storefiles if they still have references. Should we address this issue here? Or create another jira? If another Jira then in this one we can prolly add a partial fix wherein the discharger only removes contiguous storefiles? > Sometimes some compacted storefiles are not archived on region close > > > Key: HBASE-20704 > URL: https://issues.apache.org/jira/browse/HBASE-20704 > Project: HBase > Issue Type: Bug > Components: Compaction >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: Francis Liu >Assignee: Francis Liu >Priority: Critical > Attachments: HBASE-20704.001.patch, HBASE-20704.002.patch > > > During region close compacted files which have not yet been archived by the > discharger are archived as part of the region closing process. It is > important that these files are wholly archived to insure data consistency. ie > a storefile containing delete tombstones can be archived while older > storefiles containing cells that were supposed to be deleted are left > unarchived thereby undeleting those cells. > On region close a compacted storefile is skipped from archiving if it has > read references (ie open scanners). This behavior is correct for when the > discharger chore runs but on region close consistency is of course more > important so we should add a special case to ignore any references on the > storefile and go ahead and archive it. > Attached patch contains a unit test that reproduces the problem and the > proposed fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20708) Make sure there is no race between the RMP scheduled when start up and the SCP
[ https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507656#comment-16507656 ] stack commented on HBASE-20708: --- Should this be linked to another issue [~Apache9] that fills in context sir? > Make sure there is no race between the RMP scheduled when start up and the SCP > -- > > Key: HBASE-20708 > URL: https://issues.apache.org/jira/browse/HBASE-20708 > Project: HBase > Issue Type: Sub-task > Components: proc-v2, Region Assignment >Reporter: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.1.0, 2.0.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20672) Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at every monitoring interval
[ https://issues.apache.org/jira/browse/HBASE-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507655#comment-16507655 ] stack commented on HBASE-20672: --- [~jain.ankit] Can you say why we need these extra counters (I'm wary adding counters because we already spend a bunch of our processing time counting)? What is "monitoring interval" (from RN)? How is it set? Is it an hbase thing? Or is it a monitoring system thing? Thanks. > Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at > every monitoring interval > - > > Key: HBASE-20672 > URL: https://issues.apache.org/jira/browse/HBASE-20672 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Ankit Jain >Assignee: Ankit Jain >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20672.branch-1.001.patch, > HBASE-20672.master.001.patch, HBASE-20672.master.002.patch, > HBASE-20672.master.003.patch > > > Hbase currently provides counter read/write requests (ReadRequestCount, > WriteRequestCount). That said it is not easy to use counter that reset only > after a restart of the service, we would like to expose 2 new metrics in > HBase to provide ReadRequestRate and WriteRequestRate at region server level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20698) Master don't record right server version until new started region server call regionServerReport method
[ https://issues.apache.org/jira/browse/HBASE-20698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507653#comment-16507653 ] stack commented on HBASE-20698: --- +1 > Master don't record right server version until new started region server call > regionServerReport method > --- > > Key: HBASE-20698 > URL: https://issues.apache.org/jira/browse/HBASE-20698 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.0.1 > > Attachments: HBASE-20698.master.001.patch, > HBASE-20698.master.002.patch, HBASE-20698.master.003.patch, > HBASE-20698.master.addendum.patch > > > When a new region server started, it will call regionServerStartup first. > Master will record this server as a new online server and may dispath > RemoteProcedure to the new server. But master only record the server version > when the new region server call regionServerReport method. Dispatch a new > RemoteProcedure to this new regionserver will fail if version is not right. > {code:java} > @Override > protected void remoteDispatch(final ServerName serverName, > final Set remoteProcedures) { > final int rsVersion = > master.getAssignmentManager().getServerVersion(serverName); > if (rsVersion >= RS_VERSION_WITH_EXEC_PROCS) { > LOG.trace("Using procedure batch rpc execution for serverName={} > version={}", > serverName, rsVersion); > submitTask(new ExecuteProceduresRemoteCall(serverName, > remoteProcedures)); > } else { > LOG.info(String.format( > "Fallback to compat rpc execution for serverName=%s version=%s", > serverName, rsVersion)); > submitTask(new CompatRemoteProcedureResolver(serverName, > remoteProcedures)); > } > } > {code} > The above code use version to resolve compatibility problem. So dispatch will > work right for old version region server. But for RefreshPeerProcedure, it is > new since hbase 2.0. So RefreshPeerProcedure don't need this. But the new > region server version is not right, it will use CompatRemoteProcedureResolver > for RefreshPeerProcedure, too. So the RefreshPeerProcedure can't be executed > rightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20700) Move meta region when server crash can cause the procedure to be stuck
[ https://issues.apache.org/jira/browse/HBASE-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507650#comment-16507650 ] stack commented on HBASE-20700: --- None. You answered my concerns. Skimmed patch +1 (+1 for branch-2.0 too. Thanks). > Move meta region when server crash can cause the procedure to be stuck > -- > > Key: HBASE-20700 > URL: https://issues.apache.org/jira/browse/HBASE-20700 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2, Region Assignment >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.1.0, 2.0.1 > > Attachments: HBASE-20700-UT.patch, HBASE-20700-v1.patch, > HBASE-20700-v2.patch, HBASE-20700.patch > > > As said in HBASE-20682. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20711) Save on a Cell iteration when writing
[ https://issues.apache.org/jira/browse/HBASE-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507648#comment-16507648 ] stack commented on HBASE-20711: --- Good point [~chia7712] On the bugfix, makes sense to you sir? > Save on a Cell iteration when writing > - > > Key: HBASE-20711 > URL: https://issues.apache.org/jira/browse/HBASE-20711 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Assignee: stack >Priority: Minor > Attachments: HBASE-20711.branch-2.0.001.patch > > > This is a minor savings. We were doing a spin through all Cells on receipt > just to check their size when subsequently, we were doing an iteration of all > Cells to insert. It manifest as a little spike in perf output. This change > removes the purposed spin through Cells and just does size check as part of > the general Cell insert (perf spike no longer shows but the cost of the size > check still remains). > There is also a minor bug fix where on receipt we were using the Puts row > rather than the Cells row; client may have succeeded in submitting a Cell > that disagreed with the hosting Mutation and it would have been written as > something else altogether -- with the Puts row -- rather than being rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507647#comment-16507647 ] zhaoyuan commented on HBASE-20697: -- I am not familiar with region replica too …… and during debugging,When I put all regions of one table into the RegionLocations constructor,I call RegionLocations.size() and it return 1 and seems only hold the last regionLocation of the list. It confused me. So For each region location I choose to call connection.cacheLocation() and it works. IFY [~zghaobac] > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507637#comment-16507637 ] Guanghao Zhang commented on HBASE-20697: bq. connection.cacheLocation(tableName, new RegionLocations(regionLocation)); It should be a map which key is region name and value is a list of all region replica of same region? I am not familiar with region replica... All region replica has same region name? > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507634#comment-16507634 ] zhaoyuan commented on HBASE-20697: -- [~zghaobac] Hi What's wrong with docker?Should I resubmit the patch to trigger QA again or do something to solve this problem? > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20569) NPE in RecoverStandbyProcedure.execute
[ https://issues.apache.org/jira/browse/HBASE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507628#comment-16507628 ] Hadoop QA commented on HBASE-20569: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} HBASE-19064 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 20s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 50s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} HBASE-19064 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s{color} | {color:red} hbase-server: The patch generated 1 new + 212 unchanged - 0 fixed = 213 total (was 212) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 3s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 19s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.master.TestRecoverStandbyProcedure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20569 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927236/HBASE-20569.HBASE-19064.013.patch | | Optional Tests | asflicense cc unit hbaseprotoc javac javadoc findbugs
[jira] [Commented] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507625#comment-16507625 ] Hadoop QA commented on HBASE-20697: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s{color} | {color:red} Docker failed to build yetus/hbase:e77c578. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-20697 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927243/HBASE-20697-branch-1.2.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/13182/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or
[jira] [Updated] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyuan updated HBASE-20697: - Fix Version/s: 1.3.3 1.2.7 Attachment: HBASE-20697-branch-1.2.patch Status: Patch Available (was: In Progress) > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.6, 1.3.1 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Fix For: 1.2.7, 1.3.3 > > Attachments: HBASE-20697-branch-1.2.patch, > HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20331) clean up shaded packaging for 2.1
[ https://issues.apache.org/jira/browse/HBASE-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507610#comment-16507610 ] Hudson commented on HBASE-20331: Results for branch HBASE-20331 [build #38 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/38/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/38//console]. (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/38//console]. (x) {color:red}-1 jdk8 hadoop3 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/38//console]. (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/38//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > clean up shaded packaging for 2.1 > - > > Key: HBASE-20331 > URL: https://issues.apache.org/jira/browse/HBASE-20331 > Project: HBase > Issue Type: Umbrella > Components: Client, mapreduce, shading >Affects Versions: 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 3.0.0, 2.1.0 > > > polishing pass on shaded modules for 2.0 based on trying to use them in more > contexts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyuan updated HBASE-20697: - Attachment: HBASE-20697-branch-1.2.patch > Can't cache All region locations of the specify table by calling > table.getRegionLocator().getAllRegionLocations() > - > > Key: HBASE-20697 > URL: https://issues.apache.org/jira/browse/HBASE-20697 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.2.6 >Reporter: zhaoyuan >Assignee: zhaoyuan >Priority: Minor > Attachments: HBASE-20697-branch-1.2.patch > > > When we upgrade and restart a new version application which will read and > write to HBase, we will get some operation timeout. The time out is expected > because when the application restarts,It will not hold any region locations > cache and do communication with zk and meta regionserver to get region > locations. > We want to avoid these timeouts so we do warmup work and as far as I am > concerned,the method table.getRegionLocator().getAllRegionLocations() will > fetch all region locations and cache them. However, it didn't work good. > There are still a lot of time outs,so it confused me. > I dig into the source code and find something below > {code:java} > // code placeholder > public List getAllRegionLocations() throws IOException { > TableName tableName = getName(); > NavigableMap locations = > MetaScanner.allTableRegions(this.connection, tableName); > ArrayList regions = new ArrayList<>(locations.size()); > for (Entry entry : locations.entrySet()) { > regions.add(new HRegionLocation(entry.getKey(), entry.getValue())); > } > if (regions.size() > 0) { > connection.cacheLocation(tableName, new RegionLocations(regions)); > } > return regions; > } > In MetaCache > public void cacheLocation(final TableName tableName, final RegionLocations > locations) { > byte [] startKey = > locations.getRegionLocation().getRegionInfo().getStartKey(); > ConcurrentMap tableLocations = > getTableLocations(tableName); > RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, > locations); > boolean isNewCacheEntry = (oldLocation == null); > if (isNewCacheEntry) { > if (LOG.isTraceEnabled()) { > LOG.trace("Cached location: " + locations); > } > addToCachedServers(locations); > return; > } > {code} > It will collect all regions into one RegionLocations object and only cache > the first not null region location and then when we put or get to hbase, we > do getCacheLocation() > {code:java} > // code placeholder > public RegionLocations getCachedLocation(final TableName tableName, final > byte [] row) { > ConcurrentNavigableMap tableLocations = > getTableLocations(tableName); > Entry e = tableLocations.floorEntry(row); > if (e == null) { > if (metrics!= null) metrics.incrMetaCacheMiss(); > return null; > } > RegionLocations possibleRegion = e.getValue(); > // make sure that the end key is greater than the row we're looking > // for, otherwise the row actually belongs in the next region, not > // this one. the exception case is when the endkey is > // HConstants.EMPTY_END_ROW, signifying that the region we're > // checking is actually the last region in the table. > byte[] endKey = > possibleRegion.getRegionLocation().getRegionInfo().getEndKey(); > if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || > getRowComparator(tableName).compareRows( > endKey, 0, endKey.length, row, 0, row.length) > 0) { > if (metrics != null) metrics.incrMetaCacheHit(); > return possibleRegion; > } > // Passed all the way through, so we got nothing - complete cache miss > if (metrics != null) metrics.incrMetaCacheMiss(); > return null; > } > {code} > It will choose the first location to be possibleRegion and possibly it will > miss match. > So did I forget something or may be wrong somewhere? If this is indeed a bug > I think it can be fixed not very hard. > Hope commiters and PMC review this ! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20711) Save on a Cell iteration when writing
[ https://issues.apache.org/jira/browse/HBASE-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507562#comment-16507562 ] Chia-Ping Tsai commented on HBASE-20711: If we do the size check in converting the proto to mutation, we have to record the index of CellScanner in order to ignore the correct number of cells when encountering the exception. {code:java} int processedMutationIndex = 0; for (Action mutation : mutations) { // The non-null mArray[i] means the cell scanner has been read. if (mArray[processedMutationIndex++] == null) { skipCellsForMutation(mutation, cells); } }{code} > Save on a Cell iteration when writing > - > > Key: HBASE-20711 > URL: https://issues.apache.org/jira/browse/HBASE-20711 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Assignee: stack >Priority: Minor > Attachments: HBASE-20711.branch-2.0.001.patch > > > This is a minor savings. We were doing a spin through all Cells on receipt > just to check their size when subsequently, we were doing an iteration of all > Cells to insert. It manifest as a little spike in perf output. This change > removes the purposed spin through Cells and just does size check as part of > the general Cell insert (perf spike no longer shows but the cost of the size > check still remains). > There is also a minor bug fix where on receipt we were using the Puts row > rather than the Cells row; client may have succeeded in submitting a Cell > that disagreed with the hosting Mutation and it would have been written as > something else altogether -- with the Puts row -- rather than being rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20569) NPE in RecoverStandbyProcedure.execute
[ https://issues.apache.org/jira/browse/HBASE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507541#comment-16507541 ] Guanghao Zhang commented on HBASE-20569: Retry for Hadoop QA. > NPE in RecoverStandbyProcedure.execute > -- > > Key: HBASE-20569 > URL: https://issues.apache.org/jira/browse/HBASE-20569 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-20569.HBASE-19064.001.patch, > HBASE-20569.HBASE-19064.002.patch, HBASE-20569.HBASE-19064.003.patch, > HBASE-20569.HBASE-19064.004.patch, HBASE-20569.HBASE-19064.005.patch, > HBASE-20569.HBASE-19064.006.patch, HBASE-20569.HBASE-19064.007.patch, > HBASE-20569.HBASE-19064.008.patch, HBASE-20569.HBASE-19064.009.patch, > HBASE-20569.HBASE-19064.010.patch, HBASE-20569.HBASE-19064.011.patch, > HBASE-20569.HBASE-19064.012.patch, HBASE-20569.HBASE-19064.013.patch, > HBASE-20569.HBASE-19064.013.patch > > > We call ReplaySyncReplicationWALManager.initPeerWorkers in INIT_WORKERS state > and then use it in DISPATCH_TASKS. But if we restart the master and the > procedure is restarted from state DISPATCH_TASKS, no one will call the > initPeerWorkers method and we will get NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20569) NPE in RecoverStandbyProcedure.execute
[ https://issues.apache.org/jira/browse/HBASE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-20569: --- Attachment: HBASE-20569.HBASE-19064.013.patch > NPE in RecoverStandbyProcedure.execute > -- > > Key: HBASE-20569 > URL: https://issues.apache.org/jira/browse/HBASE-20569 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-20569.HBASE-19064.001.patch, > HBASE-20569.HBASE-19064.002.patch, HBASE-20569.HBASE-19064.003.patch, > HBASE-20569.HBASE-19064.004.patch, HBASE-20569.HBASE-19064.005.patch, > HBASE-20569.HBASE-19064.006.patch, HBASE-20569.HBASE-19064.007.patch, > HBASE-20569.HBASE-19064.008.patch, HBASE-20569.HBASE-19064.009.patch, > HBASE-20569.HBASE-19064.010.patch, HBASE-20569.HBASE-19064.011.patch, > HBASE-20569.HBASE-19064.012.patch, HBASE-20569.HBASE-19064.013.patch, > HBASE-20569.HBASE-19064.013.patch > > > We call ReplaySyncReplicationWALManager.initPeerWorkers in INIT_WORKERS state > and then use it in DISPATCH_TASKS. But if we restart the master and the > procedure is restarted from state DISPATCH_TASKS, no one will call the > initPeerWorkers method and we will get NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20331) clean up shaded packaging for 2.1
[ https://issues.apache.org/jira/browse/HBASE-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507525#comment-16507525 ] Hudson commented on HBASE-20331: Results for branch HBASE-20331 [build #37 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/37/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/37//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/37//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/37//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/37//artifacts/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > clean up shaded packaging for 2.1 > - > > Key: HBASE-20331 > URL: https://issues.apache.org/jira/browse/HBASE-20331 > Project: HBase > Issue Type: Umbrella > Components: Client, mapreduce, shading >Affects Versions: 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 3.0.0, 2.1.0 > > > polishing pass on shaded modules for 2.0 based on trying to use them in more > contexts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20569) NPE in RecoverStandbyProcedure.execute
[ https://issues.apache.org/jira/browse/HBASE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507472#comment-16507472 ] Hadoop QA commented on HBASE-20569: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} HBASE-19064 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 24s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} HBASE-19064 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} HBASE-19064 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 59s{color} | {color:red} hbase-server: The patch generated 1 new + 212 unchanged - 0 fixed = 213 total (was 212) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 18s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}164m 14s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}210m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.master.TestRecoverStandbyProcedure | | | hadoop.hbase.replication.TestSyncReplicationMoreLogsInLocalCopyToRemote | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20569 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927216/HBASE-20569.HBASE-19064.013.patch |
[jira] [Commented] (HBASE-20709) CompatRemoteProcedureResolver should call remoteCallFailed method instead of throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HBASE-20709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507446#comment-16507446 ] Hadoop QA commented on HBASE-20709: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 59s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 16s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 29s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20709 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927214/HBASE-20709.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5650b84baa99 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / cc7aefe0bb | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/13179/testReport/ | | Max. process+thread count | 4339 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/13179/console |
[jira] [Updated] (HBASE-20569) NPE in RecoverStandbyProcedure.execute
[ https://issues.apache.org/jira/browse/HBASE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-20569: --- Attachment: HBASE-20569.HBASE-19064.013.patch > NPE in RecoverStandbyProcedure.execute > -- > > Key: HBASE-20569 > URL: https://issues.apache.org/jira/browse/HBASE-20569 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-20569.HBASE-19064.001.patch, > HBASE-20569.HBASE-19064.002.patch, HBASE-20569.HBASE-19064.003.patch, > HBASE-20569.HBASE-19064.004.patch, HBASE-20569.HBASE-19064.005.patch, > HBASE-20569.HBASE-19064.006.patch, HBASE-20569.HBASE-19064.007.patch, > HBASE-20569.HBASE-19064.008.patch, HBASE-20569.HBASE-19064.009.patch, > HBASE-20569.HBASE-19064.010.patch, HBASE-20569.HBASE-19064.011.patch, > HBASE-20569.HBASE-19064.012.patch, HBASE-20569.HBASE-19064.013.patch > > > We call ReplaySyncReplicationWALManager.initPeerWorkers in INIT_WORKERS state > and then use it in DISPATCH_TASKS. But if we restart the master and the > procedure is restarted from state DISPATCH_TASKS, no one will call the > initPeerWorkers method and we will get NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20709) CompatRemoteProcedureResolver should call remoteCallFailed method instead of throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HBASE-20709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-20709: --- Attachment: HBASE-20709.master.002.patch > CompatRemoteProcedureResolver should call remoteCallFailed method instead of > throw UnsupportedOperationException > > > Key: HBASE-20709 > URL: https://issues.apache.org/jira/browse/HBASE-20709 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 2.0.0 > Environment: # >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-20709.master.001.patch, > HBASE-20709.master.002.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java > {code:java} > @Override > public void dispatchServerOperations(MasterProcedureEnv env, > List operations) { > throw new UnsupportedOperationException(); > } > {code} > As the procedure request not send to remote server, remoteOperationFailed and > remoteOperationCompleted can't be called. So the procedure will stuck and no > log to show this. The new patach will call remoteCallFailed method and make > the procedure failed. It is easy to find problem by the new exception message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20709) CompatRemoteProcedureResolver should call remoteCallFailed method instead of throw UnsupportedOperationException
[ https://issues.apache.org/jira/browse/HBASE-20709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507408#comment-16507408 ] Duo Zhang commented on HBASE-20709: --- Then let's add something like 'This should not happen, there must be bugs in your code, please check!'? > CompatRemoteProcedureResolver should call remoteCallFailed method instead of > throw UnsupportedOperationException > > > Key: HBASE-20709 > URL: https://issues.apache.org/jira/browse/HBASE-20709 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 2.0.0 > Environment: # >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-20709.master.001.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java > {code:java} > @Override > public void dispatchServerOperations(MasterProcedureEnv env, > List operations) { > throw new UnsupportedOperationException(); > } > {code} > As the procedure request not send to remote server, remoteOperationFailed and > remoteOperationCompleted can't be called. So the procedure will stuck and no > log to show this. The new patach will call remoteCallFailed method and make > the procedure failed. It is easy to find problem by the new exception message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20698) Master don't record right server version until new started region server call regionServerReport method
[ https://issues.apache.org/jira/browse/HBASE-20698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507393#comment-16507393 ] Hudson commented on HBASE-20698: Results for branch master [build #361 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/361/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/361//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/361//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/361//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Master don't record right server version until new started region server call > regionServerReport method > --- > > Key: HBASE-20698 > URL: https://issues.apache.org/jira/browse/HBASE-20698 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.0.1 > > Attachments: HBASE-20698.master.001.patch, > HBASE-20698.master.002.patch, HBASE-20698.master.003.patch, > HBASE-20698.master.addendum.patch > > > When a new region server started, it will call regionServerStartup first. > Master will record this server as a new online server and may dispath > RemoteProcedure to the new server. But master only record the server version > when the new region server call regionServerReport method. Dispatch a new > RemoteProcedure to this new regionserver will fail if version is not right. > {code:java} > @Override > protected void remoteDispatch(final ServerName serverName, > final Set remoteProcedures) { > final int rsVersion = > master.getAssignmentManager().getServerVersion(serverName); > if (rsVersion >= RS_VERSION_WITH_EXEC_PROCS) { > LOG.trace("Using procedure batch rpc execution for serverName={} > version={}", > serverName, rsVersion); > submitTask(new ExecuteProceduresRemoteCall(serverName, > remoteProcedures)); > } else { > LOG.info(String.format( > "Fallback to compat rpc execution for serverName=%s version=%s", > serverName, rsVersion)); > submitTask(new CompatRemoteProcedureResolver(serverName, > remoteProcedures)); > } > } > {code} > The above code use version to resolve compatibility problem. So dispatch will > work right for old version region server. But for RefreshPeerProcedure, it is > new since hbase 2.0. So RefreshPeerProcedure don't need this. But the new > region server version is not right, it will use CompatRemoteProcedureResolver > for RefreshPeerProcedure, too. So the RefreshPeerProcedure can't be executed > rightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20331) clean up shaded packaging for 2.1
[ https://issues.apache.org/jira/browse/HBASE-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507358#comment-16507358 ] Hudson commented on HBASE-20331: Results for branch HBASE-20331 [build #36 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/36/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/36//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/36//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/36//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20331/36//artifacts/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > clean up shaded packaging for 2.1 > - > > Key: HBASE-20331 > URL: https://issues.apache.org/jira/browse/HBASE-20331 > Project: HBase > Issue Type: Umbrella > Components: Client, mapreduce, shading >Affects Versions: 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 3.0.0, 2.1.0 > > > polishing pass on shaded modules for 2.0 based on trying to use them in more > contexts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20698) Master don't record right server version until new started region server call regionServerReport method
[ https://issues.apache.org/jira/browse/HBASE-20698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507326#comment-16507326 ] Hudson commented on HBASE-20698: Results for branch branch-2.0 [build #410 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/410/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/410//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/410//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/410//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Master don't record right server version until new started region server call > regionServerReport method > --- > > Key: HBASE-20698 > URL: https://issues.apache.org/jira/browse/HBASE-20698 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.0.1 > > Attachments: HBASE-20698.master.001.patch, > HBASE-20698.master.002.patch, HBASE-20698.master.003.patch, > HBASE-20698.master.addendum.patch > > > When a new region server started, it will call regionServerStartup first. > Master will record this server as a new online server and may dispath > RemoteProcedure to the new server. But master only record the server version > when the new region server call regionServerReport method. Dispatch a new > RemoteProcedure to this new regionserver will fail if version is not right. > {code:java} > @Override > protected void remoteDispatch(final ServerName serverName, > final Set remoteProcedures) { > final int rsVersion = > master.getAssignmentManager().getServerVersion(serverName); > if (rsVersion >= RS_VERSION_WITH_EXEC_PROCS) { > LOG.trace("Using procedure batch rpc execution for serverName={} > version={}", > serverName, rsVersion); > submitTask(new ExecuteProceduresRemoteCall(serverName, > remoteProcedures)); > } else { > LOG.info(String.format( > "Fallback to compat rpc execution for serverName=%s version=%s", > serverName, rsVersion)); > submitTask(new CompatRemoteProcedureResolver(serverName, > remoteProcedures)); > } > } > {code} > The above code use version to resolve compatibility problem. So dispatch will > work right for old version region server. But for RefreshPeerProcedure, it is > new since hbase 2.0. So RefreshPeerProcedure don't need this. But the new > region server version is not right, it will use CompatRemoteProcedureResolver > for RefreshPeerProcedure, too. So the RefreshPeerProcedure can't be executed > rightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)