[jira] [Commented] (HBASE-17269) Intermittent TestMasterProcedureSchedulerConcurrency failure
[ https://issues.apache.org/jira/browse/HBASE-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726386#comment-15726386 ] Matteo Bertozzi commented on HBASE-17269: - this is known to be a bit flaky since it relies on timing, but it should be solved by HBASE-17067. but it will take a week or two to get that one in. we are trying to get in what that is depending on > Intermittent TestMasterProcedureSchedulerConcurrency failure > > > Key: HBASE-17269 > URL: https://issues.apache.org/jira/browse/HBASE-17269 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Matteo Bertozzi >Priority: Minor > > TestMasterProcedureSchedulerConcurrency sometimes appeared as timed out test > in QA runs. > In > https://builds.apache.org/job/HBase-TRUNK_matrix/2083/jdk=JDK%201.8%20(latest),label=Hadoop/testReport/org.apache.hadoop.hbase.master.procedure/TestMasterProcedureSchedulerConcurrency/testMasterProcedureSchedulerPerformanceEvaluation/ > : > I saw: > {code} > 2016-12-06 14:22:23,888 ERROR [Time-limited test] > util.AbstractHBaseTool(151): Error running command-line tool > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1249) > at java.lang.Thread.join(Thread.java:1323) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.runThreads(MasterProcedureSchedulerPerformanceEvaluation.java:239) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.doWork(MasterProcedureSchedulerPerformanceEvaluation.java:261) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:149) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.main(MasterProcedureSchedulerPerformanceEvaluation.java:294) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17269) Intermittent TestMasterProcedureSchedulerConcurrency failure
[ https://issues.apache.org/jira/browse/HBASE-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-17269: --- Assignee: Matteo Bertozzi > Intermittent TestMasterProcedureSchedulerConcurrency failure > > > Key: HBASE-17269 > URL: https://issues.apache.org/jira/browse/HBASE-17269 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Matteo Bertozzi >Priority: Minor > > TestMasterProcedureSchedulerConcurrency sometimes appeared as timed out test > in QA runs. > In > https://builds.apache.org/job/HBase-TRUNK_matrix/2083/jdk=JDK%201.8%20(latest),label=Hadoop/testReport/org.apache.hadoop.hbase.master.procedure/TestMasterProcedureSchedulerConcurrency/testMasterProcedureSchedulerPerformanceEvaluation/ > : > I saw: > {code} > 2016-12-06 14:22:23,888 ERROR [Time-limited test] > util.AbstractHBaseTool(151): Error running command-line tool > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1249) > at java.lang.Thread.join(Thread.java:1323) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.runThreads(MasterProcedureSchedulerPerformanceEvaluation.java:239) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.doWork(MasterProcedureSchedulerPerformanceEvaluation.java:261) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:149) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureSchedulerPerformanceEvaluation.main(MasterProcedureSchedulerPerformanceEvaluation.java:294) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17260) Procedure v2 - Add setOwner() overload taking a User instance
[ https://issues.apache.org/jira/browse/HBASE-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17260: Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure v2 - Add setOwner() overload taking a User instance > - > > Key: HBASE-17260 > URL: https://issues.apache.org/jira/browse/HBASE-17260 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17260-v0.patch > > > since we should have a User instance in most of the cases, we should just be > able to pass it, rather than converting it to getShortName() every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-16786) Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure)
[ https://issues.apache.org/jira/browse/HBASE-16786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-16786: --- Assignee: Matteo Bertozzi (was: Appy) > Procedure V2 - Move ZK-lock's uses to Procedure framework locks > (LockProcedure) > --- > > Key: HBASE-16786 > URL: https://issues.apache.org/jira/browse/HBASE-16786 > Project: HBase > Issue Type: Sub-task >Reporter: Appy >Assignee: Matteo Bertozzi > Attachments: HBASE-16786.master.001.patch, > HBASE-16786.master.002.patch, HBASE-16786.master.003.patch, > HBASE-16786.master.004.patch, HBASE-16786.master.005.patch, > HBASE-16786.master.006.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-16744) Procedure V2 - Lock procedures to allow clients to acquire locks on tables/namespaces/regions
[ https://issues.apache.org/jira/browse/HBASE-16744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-16744: --- Assignee: Matteo Bertozzi (was: Appy) > Procedure V2 - Lock procedures to allow clients to acquire locks on > tables/namespaces/regions > - > > Key: HBASE-16744 > URL: https://issues.apache.org/jira/browse/HBASE-16744 > Project: HBase > Issue Type: Sub-task >Reporter: Appy >Assignee: Matteo Bertozzi > Attachments: HBASE-16744.master.001.patch, > HBASE-16744.master.002.patch, HBASE-16744.master.003.patch, > HBASE-16744.master.004.patch, HBASE-16744.master.005.patch, > HBASE-16744.master.006.patch, HBASE-16744.master.007.patch, > HBASE-16744.master.008.patch, HBASE-16744.master.009.patch, > HBASE-16744.master.010.patch, HBASE-16744.master.011.patch, > HBASE-16744.master.012.patch, HBASE-16744.master.013.patch > > > Will help us get rid of ZK locks. > Will be useful for external tools like hbck, future backup manager, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17250) For Get and scan in one case, checkFamily can be skipped in Region#getScanner
[ https://issues.apache.org/jira/browse/HBASE-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723826#comment-15723826 ] Matteo Bertozzi commented on HBASE-17250: - [~tedyu] I don't think scan.setAttribute() is the right place for it. from the patch looks like the "skipCheckFamily" is specific on how we implemented the get() code. we are using that getScanner() in both scan() and get() but in get we already checked the families before calling getScanner(). maybe an alternative to the flag, is that in both cases we check the families before doing anything. since in both cases we call the coprocessors with the scan or get object, and in theory we want to make sure the families are correct. in this case we check early and getScanner() will end up without any check. but this means that coprocessors that are using directly region.getScanner() should do validation.. so maybe the skipCheckFamily flag is safe for compatibility and clarity > For Get and scan in one case, checkFamily can be skipped in Region#getScanner > - > > Key: HBASE-17250 > URL: https://issues.apache.org/jira/browse/HBASE-17250 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Attachments: HBASE-17250-master-001.patch > > > For get(), checkFamily is done in prepareGet(), so checkFamily can be skipped > in Region#getScanner(). For scan(), if there is no Family configured in scan, > the families are from table descriptor, so checkFamily in > Region#getScanner(). can be skipped in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17260) Procedure v2 - Add setOwner() overload taking a User instance
[ https://issues.apache.org/jira/browse/HBASE-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17260: Attachment: HBASE-17260-v0.patch > Procedure v2 - Add setOwner() overload taking a User instance > - > > Key: HBASE-17260 > URL: https://issues.apache.org/jira/browse/HBASE-17260 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17260-v0.patch > > > since we should have a User instance in most of the cases, we should just be > able to pass it, rather than converting it to getShortName() every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17260) Procedure v2 - Add setOwner() overload taking a User instance
[ https://issues.apache.org/jira/browse/HBASE-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17260: Status: Patch Available (was: Open) > Procedure v2 - Add setOwner() overload taking a User instance > - > > Key: HBASE-17260 > URL: https://issues.apache.org/jira/browse/HBASE-17260 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17260-v0.patch > > > since we should have a User instance in most of the cases, we should just be > able to pass it, rather than converting it to getShortName() every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17260) Procedure v2 - Add setOwner() overload taking a User instance
Matteo Bertozzi created HBASE-17260: --- Summary: Procedure v2 - Add setOwner() overload taking a User instance Key: HBASE-17260 URL: https://issues.apache.org/jira/browse/HBASE-17260 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 since we should have a User instance in most of the cases, we should just be able to pass it, rather than converting it to getShortName() every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17243) Reuse CompactionPartitionId and avoid creating MobFileName in PartitionedMobCompactor to avoid unnecessary new objects
[ https://issues.apache.org/jira/browse/HBASE-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15718270#comment-15718270 ] Matteo Bertozzi commented on HBASE-17243: - thanks for double checking. addendum is good, and committed. I did a run with the addendum and it seems to pass. but I guess we should add a test to verify the select(). open a new jira for that. > Reuse CompactionPartitionId and avoid creating MobFileName in > PartitionedMobCompactor to avoid unnecessary new objects > -- > > Key: HBASE-17243 > URL: https://issues.apache.org/jira/browse/HBASE-17243 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17243-master-001.patch, > HBASE-17243-master-002.patch, HBASE-17243-master-addendum.patch > > > In today's select() implementation, when it is an existing id, the new > allocated object is discarded. It should be reused. fileName is created to > getStartKey and getDate(), utility APIs can be created to directly get these > fields from the string. > {code} > } else if (allFiles || linkedFile.getLen() < mergeableSize) { > // add all files if allFiles is true, > // otherwise add the small files to the merge pool > MobFileName fileName = > MobFileName.create(linkedFile.getPath().getName()); > CompactionPartitionId id = new > CompactionPartitionId(fileName.getStartKey(), > fileName.getDate()); > CompactionPartition compactionPartition = filesToCompact.get(id); > if (compactionPartition == null) { > compactionPartition = new CompactionPartition(id); > compactionPartition.addFile(file); > filesToCompact.put(id, compactionPartition); > } else { > compactionPartition.addFile(file); > } > selectedFileCount++; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17243) Reuse CompactionPartitionId and avoid creating MobFileName in PartitionedMobCompactor to avoid unnecessary new objects
[ https://issues.apache.org/jira/browse/HBASE-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17243: Resolution: Fixed Fix Version/s: 2.0.0 Status: Resolved (was: Patch Available) > Reuse CompactionPartitionId and avoid creating MobFileName in > PartitionedMobCompactor to avoid unnecessary new objects > -- > > Key: HBASE-17243 > URL: https://issues.apache.org/jira/browse/HBASE-17243 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17243-master-001.patch, > HBASE-17243-master-002.patch > > > In today's select() implementation, when it is an existing id, the new > allocated object is discarded. It should be reused. fileName is created to > getStartKey and getDate(), utility APIs can be created to directly get these > fields from the string. > {code} > } else if (allFiles || linkedFile.getLen() < mergeableSize) { > // add all files if allFiles is true, > // otherwise add the small files to the merge pool > MobFileName fileName = > MobFileName.create(linkedFile.getPath().getName()); > CompactionPartitionId id = new > CompactionPartitionId(fileName.getStartKey(), > fileName.getDate()); > CompactionPartition compactionPartition = filesToCompact.get(id); > if (compactionPartition == null) { > compactionPartition = new CompactionPartition(id); > compactionPartition.addFile(file); > filesToCompact.put(id, compactionPartition); > } else { > compactionPartition.addFile(file); > } > selectedFileCount++; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17243) Reuse CompactionPartitionId and avoid creating MobFileName in PartitionedMobCompactor to avoid unnecessary new objects
[ https://issues.apache.org/jira/browse/HBASE-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716653#comment-15716653 ] Matteo Bertozzi commented on HBASE-17243: - +1 test failures seems unrelated > Reuse CompactionPartitionId and avoid creating MobFileName in > PartitionedMobCompactor to avoid unnecessary new objects > -- > > Key: HBASE-17243 > URL: https://issues.apache.org/jira/browse/HBASE-17243 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Attachments: HBASE-17243-master-001.patch, > HBASE-17243-master-002.patch > > > In today's select() implementation, when it is an existing id, the new > allocated object is discarded. It should be reused. fileName is created to > getStartKey and getDate(), utility APIs can be created to directly get these > fields from the string. > {code} > } else if (allFiles || linkedFile.getLen() < mergeableSize) { > // add all files if allFiles is true, > // otherwise add the small files to the merge pool > MobFileName fileName = > MobFileName.create(linkedFile.getPath().getName()); > CompactionPartitionId id = new > CompactionPartitionId(fileName.getStartKey(), > fileName.getDate()); > CompactionPartition compactionPartition = filesToCompact.get(id); > if (compactionPartition == null) { > compactionPartition = new CompactionPartition(id); > compactionPartition.addFile(file); > filesToCompact.put(id, compactionPartition); > } else { > compactionPartition.addFile(file); > } > selectedFileCount++; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17243) Reuse CompactionPartitionId and avoid creating MobFileName in PartitionedMobCompactor to avoid unnecessary new objects
[ https://issues.apache.org/jira/browse/HBASE-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716028#comment-15716028 ] Matteo Bertozzi commented on HBASE-17243: - do the _END_INDEX constants needs to be public? it seems something that only the MobFileName should know about. the constructor new CompactionPartitionId("", "") with the two empty string looks a bit weird, maybe add an empty constructor since now the object is mutable. > Reuse CompactionPartitionId and avoid creating MobFileName in > PartitionedMobCompactor to avoid unnecessary new objects > -- > > Key: HBASE-17243 > URL: https://issues.apache.org/jira/browse/HBASE-17243 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Attachments: HBASE-17243-master-001.patch > > > In today's select() implementation, when it is an existing id, the new > allocated object is discarded. It should be reused. fileName is created to > getStartKey and getDate(), utility APIs can be created to directly get these > fields from the string. > {code} > } else if (allFiles || linkedFile.getLen() < mergeableSize) { > // add all files if allFiles is true, > // otherwise add the small files to the merge pool > MobFileName fileName = > MobFileName.create(linkedFile.getPath().getName()); > CompactionPartitionId id = new > CompactionPartitionId(fileName.getStartKey(), > fileName.getDate()); > CompactionPartition compactionPartition = filesToCompact.get(id); > if (compactionPartition == null) { > compactionPartition = new CompactionPartition(id); > compactionPartition.addFile(file); > filesToCompact.put(id, compactionPartition); > } else { > compactionPartition.addFile(file); > } > selectedFileCount++; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17238) Wrong in-memory hbase:meta location causing SSH failure
[ https://issues.apache.org/jira/browse/HBASE-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715407#comment-15715407 ] Matteo Bertozzi commented on HBASE-17238: - sounds good to me > Wrong in-memory hbase:meta location causing SSH failure > --- > > Key: HBASE-17238 > URL: https://issues.apache.org/jira/browse/HBASE-17238 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.1.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Critical > > In HBase 1.x, if HMaster#assignMeta() assigns a non-DEFAULT_REPLICA_ID > hbase:meta region, it would wrongly update the DEFAULT_REPLICA_ID hbase:meta > region in-memory. This caused the in-memory region state has wrong RS > information for default replica hbase:meta region. One of the problem we saw > is a wrong type of SSH could be chosen and causing problems. > {code} > void assignMeta(MonitoredTask status, Set > previouslyFailedMetaRSs, int replicaId) > throws InterruptedException, IOException, KeeperException { > // Work on meta region > ... > if (replicaId == HRegionInfo.DEFAULT_REPLICA_ID) { > status.setStatus("Assigning hbase:meta region"); > } else { > status.setStatus("Assigning hbase:meta region, replicaId " + replicaId); > } > // Get current meta state from zk. > RegionStates regionStates = assignmentManager.getRegionStates(); > RegionState metaState = > MetaTableLocator.getMetaRegionState(getZooKeeper(), replicaId); > HRegionInfo hri = > RegionReplicaUtil.getRegionInfoForReplica(HRegionInfo.FIRST_META_REGIONINFO, > replicaId); > ServerName currentMetaServer = metaState.getServerName(); > ... > boolean rit = this.assignmentManager. > processRegionInTransitionAndBlockUntilAssigned(hri); > boolean metaRegionLocation = metaTableLocator.verifyMetaRegionLocation( > this.getConnection(), this.getZooKeeper(), timeout, replicaId); > ... > } else { > // Region already assigned. We didn't assign it. Add to in-memory state. > regionStates.updateRegionState( > HRegionInfo.FIRST_META_REGIONINFO, State.OPEN, currentMetaServer); > <<--- Wrong region to update -->> > this.assignmentManager.regionOnline( > HRegionInfo.FIRST_META_REGIONINFO, currentMetaServer); <<--- Wrong > region to update -->> > } > ... > {code} > Here is the problem scenario: > Step 1: master failovers (or starts could have the same issue) and find > default replica of hbase:meta is in rs1. > {noformat} > 2016-11-26 00:06:36,590 INFO org.apache.hadoop.hbase.master.ServerManager: > AssignmentManager hasn't finished failover cleanup; waiting > 2016-11-26 00:06:36,591 INFO org.apache.hadoop.hbase.master.HMaster: > hbase:meta with replicaId 0 assigned=0, rit=false, > location=rs1,60200,1480103147220 > {noformat} > Step 2: master finds that replica 1 of hbase:meta is unassigned, therefore, > HMaster#assignMeta() is called and assign the replica 1 region to rs2, but at > the end, it wrongly updates the in-memory state of default replica to rs2 > {noformat} > 2016-11-26 00:08:21,741 DEBUG org.apache.hadoop.hbase.master.RegionStates: > Onlined 1588230740 on rs2,60200,1480102993815 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > 2016-11-26 00:08:21,741 INFO org.apache.hadoop.hbase.master.RegionStates: > Offlined 1588230740 from rs1,60200,1480103147220 > 2016-11-26 00:08:21,741 INFO org.apache.hadoop.hbase.master.HMaster: > hbase:meta with replicaId 1 assigned=0, rit=false, > location=rs2,60200,1480102993815 > {noformat} > Step 3: now rs1 is down, master needs to choose which SSH to call > (MetaServerShutdownHandler or normal ServerShutdownHandler) - in this case, > MetaServerShutdownHandler should be chosen; however, due to wrong in-memory > location, normal ServerShutdownHandler was chosen: > {noformat} > 2016-11-26 00:08:33,995 INFO > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral > node deleted, processing expiration [rs1,60200,1480103147220] > 2016-11-26 00:08:33,998 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current > region=hbase:meta,,1.1588230740 is on server=rs2,60200,1480102993815 server > being checked: rs1,60200,1480103147220 > 2016-11-26 00:08:34,001 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=rs1,60200,1480103147220 to dead servers, submitted shutdown handler to > be executed meta=false > {noformat} > Step 4: Wrong SSH was chosen. Due to accessing hbase:meta failure, the SSH > failed after retries. Now the dead server was not processed; regions in that > server remains un-usable (We have a solution that resolve this
[jira] [Comment Edited] (HBASE-17228) precommit grep -c ERROR may grab non errors
[ https://issues.apache.org/jira/browse/HBASE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715392#comment-15715392 ] Matteo Bertozzi edited comment on HBASE-17228 at 12/2/16 3:21 PM: -- yeah, I think the patch it's ok. +1 I did a quick check and it looks like the errors are around \[ERROR\]. the only thing it seems that we end up with too many errors, but that it was also true before. for example a simple typo in one protobuf line gives me 36 error, all marked as \[ERROR\]. {noformat} [ERROR] PROTOC FAILED: [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: Admin.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) ... [ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.0:compile (compile-protoc) on project hbase-protocol-shaded: protoc did not exit cleanly. Review output for more information. -> [Help 1] {noformat} was (Author: mbertozzi): yeah, I think the patch it's ok. I did a quick check and it looks like the errors are around \[ERROR\]. the only thing it seems that we end up with too many errors, but that it was also true before. for example a simple typo in one protobuf line gives me 36 error, all marked as \[ERROR\]. {noformat} [ERROR] PROTOC FAILED: [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: Admin.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) ... [ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.0:compile (compile-protoc) on project hbase-protocol-shaded: protoc did not exit cleanly. Review output for more information. -> [Help 1] {noformat} > precommit grep -c ERROR may grab non errors > --- > > Key: HBASE-17228 > URL: https://issues.apache.org/jira/browse/HBASE-17228 > Project: HBase > Issue Type: Bug > Components: scripts >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-17228.master.001.patch > > > it looks like that we do a simple "grep -c ERROR" to count the errors that we > have from the build. > https://github.com/apache/hbase/blob/master/dev-support/hbase-personality.sh#L305 > but in this way we ended up with a count=1 just because we have one enum > called ERROR_CODE in hbase. and the enum shows up as debug message > {noformat} > $ grep ERROR patch-hbaseprotoc-hbase-server.txt > [DEBUG] adding entry > org/apache/hadoop/hbase/util/HBaseFsck$ErrorReporter$ERROR_CODE.class > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17228) precommit grep -c ERROR may grab non errors
[ https://issues.apache.org/jira/browse/HBASE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715392#comment-15715392 ] Matteo Bertozzi commented on HBASE-17228: - yeah, I think the patch it's ok. I did a quick check and it looks like the errors are around \[ERROR\]. the only thing it seems that we end up with too many errors, but that it was also true before. for example a simple typo in one protobuf line gives me 36 error, all marked as \[ERROR\]. {noformat} [ERROR] PROTOC FAILED: [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: Admin.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) ... [ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.0:compile (compile-protoc) on project hbase-protocol-shaded: protoc did not exit cleanly. Review output for more information. -> [Help 1] {noformat} > precommit grep -c ERROR may grab non errors > --- > > Key: HBASE-17228 > URL: https://issues.apache.org/jira/browse/HBASE-17228 > Project: HBase > Issue Type: Bug > Components: scripts >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-17228.master.001.patch > > > it looks like that we do a simple "grep -c ERROR" to count the errors that we > have from the build. > https://github.com/apache/hbase/blob/master/dev-support/hbase-personality.sh#L305 > but in this way we ended up with a count=1 just because we have one enum > called ERROR_CODE in hbase. and the enum shows up as debug message > {noformat} > $ grep ERROR patch-hbaseprotoc-hbase-server.txt > [DEBUG] adding entry > org/apache/hadoop/hbase/util/HBaseFsck$ErrorReporter$ERROR_CODE.class > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16841) Data loss in MOB files after cloning a snapshot and deleting that snapshot
[ https://issues.apache.org/jira/browse/HBASE-16841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715370#comment-15715370 ] Matteo Bertozzi commented on HBASE-16841: - +1 > Data loss in MOB files after cloning a snapshot and deleting that snapshot > -- > > Key: HBASE-16841 > URL: https://issues.apache.org/jira/browse/HBASE-16841 > Project: HBase > Issue Type: Bug > Components: mob, snapshots >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBASE-16841-V2.patch, HBASE-16841-V3.patch, > HBASE-16841-V4.patch, HBASE-16841-V5.patch, HBASE-16841-V6.patch, > HBASE-16841.patch > > > Running the following steps will probably lose MOB data when working with > snapshots. > 1. Create a mob-enabled table by running create 't1', {NAME => 'f1', IS_MOB > => true, MOB_THRESHOLD => 0}. > 2. Put millions of data. > 3. Run {{snapshot 't1','t1_snapshot'}} to take a snapshot for this table t1. > 4. Run {{clone_snapshot 't1_snapshot','t1_cloned'}} to clone this snapshot. > 5. Run {{delete_snapshot 't1_snapshot'}} to delete this snapshot. > 6. Run {{disable 't1'}} and {{delete 't1'}} to delete the table. > 7. Now go to the archive directory of t1, the number of .link directories is > different from the number of hfiles which means some data will be lost after > the hfile cleaner runs. > This is because, when taking a snapshot on a enabled mob table, each region > flushes itself and takes a snapshot, and the mob snapshot is taken only if > the current region is first region of the table. At that time, the flushing > of some regions might not be finished, and some mob files are not flushed to > disk yet. Eventually some mob files are not recorded in the snapshot manifest. > To solve this, we need to take the mob snapshot at last after the snapshots > on all the online and offline regions are finished in > {{EnabledTableSnapshotHandler}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17228) precommit grep -c ERROR may grab non errors
Matteo Bertozzi created HBASE-17228: --- Summary: precommit grep -c ERROR may grab non errors Key: HBASE-17228 URL: https://issues.apache.org/jira/browse/HBASE-17228 Project: HBase Issue Type: Bug Components: scripts Reporter: Matteo Bertozzi Priority: Minor it looks like that we do a simple "grep -c ERROR" to count the errors that we have from the build. https://github.com/apache/hbase/blob/master/dev-support/hbase-personality.sh#L305 but in this way we ended up with a count=1 just because we have one enum called ERROR_CODE in hbase. and the enum shows up as debug message {noformat} $ grep ERROR patch-hbaseprotoc-hbase-server.txt [DEBUG] adding entry org/apache/hadoop/hbase/util/HBaseFsck$ErrorReporter$ERROR_CODE.class {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16841) Data loss in MOB files after cloning a snapshot and deleting that snapshot
[ https://issues.apache.org/jira/browse/HBASE-16841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713396#comment-15713396 ] Matteo Bertozzi commented on HBASE-16841: - sorry, missed the ping. the snapshot code moved, so the patch does not apply to current master, need a little change. patch looks good. only thing is that we can replace the two for loop that find if hcd.isMobEnabled() with MobUtil.hasMobColumns(htd). > Data loss in MOB files after cloning a snapshot and deleting that snapshot > -- > > Key: HBASE-16841 > URL: https://issues.apache.org/jira/browse/HBASE-16841 > Project: HBase > Issue Type: Bug > Components: mob, snapshots >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBASE-16841-V2.patch, HBASE-16841-V3.patch, > HBASE-16841-V4.patch, HBASE-16841-V5.patch, HBASE-16841.patch > > > Running the following steps will probably lose MOB data when working with > snapshots. > 1. Create a mob-enabled table by running create 't1', {NAME => 'f1', IS_MOB > => true, MOB_THRESHOLD => 0}. > 2. Put millions of data. > 3. Run {{snapshot 't1','t1_snapshot'}} to take a snapshot for this table t1. > 4. Run {{clone_snapshot 't1_snapshot','t1_cloned'}} to clone this snapshot. > 5. Run {{delete_snapshot 't1_snapshot'}} to delete this snapshot. > 6. Run {{disable 't1'}} and {{delete 't1'}} to delete the table. > 7. Now go to the archive directory of t1, the number of .link directories is > different from the number of hfiles which means some data will be lost after > the hfile cleaner runs. > This is because, when taking a snapshot on a enabled mob table, each region > flushes itself and takes a snapshot, and the mob snapshot is taken only if > the current region is first region of the table. At that time, the flushing > of some regions might not be finished, and some mob files are not flushed to > disk yet. Eventually some mob files are not recorded in the snapshot manifest. > To solve this, we need to take the mob snapshot at last after the snapshots > on all the online and offline regions are finished in > {{EnabledTableSnapshotHandler}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17205: Resolution: Fixed Fix Version/s: 1.4.0 2.0.0 Status: Resolved (was: Patch Available) > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712212#comment-15712212 ] Matteo Bertozzi commented on HBASE-17205: - +1 > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709271#comment-15709271 ] Matteo Bertozzi commented on HBASE-17205: - looks ok to me, the only strange thing is that now RegionState is no longer an immutable object. maybe just avoid having that updateRitDuration() public since it is used only by RegionStates, and mark it as @InterfaceAudience.Private. in any case, this fix is ok for me. with the new AM we have the actual time of assign and unassign operation for each region and the time of the region in failed open or those kind of states. so for 2.0, when the new AM is ready, we are probably going back to the immutable region state. > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17212) Should add null checker on table name in HTable and RegionServerCallable constructor
[ https://issues.apache.org/jira/browse/HBASE-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709229#comment-15709229 ] Matteo Bertozzi commented on HBASE-17212: - +1 > Should add null checker on table name in HTable and RegionServerCallable > constructor > > > Key: HBASE-17212 > URL: https://issues.apache.org/jira/browse/HBASE-17212 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-17212.patch, HBASE-17212.v2.patch > > > If we run below codes: > {code} > Table table = connection.getTable(null); > {code} > we will see below exception: > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:221) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:182) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getTable(ConnectionImplementation.java:298) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getTable(ConnectionImplementation.java:293) > {noformat} > And in this JIRA we will add a null checker and throw a more graceful > {{IllegalArgumentException}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17189) TestMasterObserver#wasModifyTableActionCalled uses wrong variables
[ https://issues.apache.org/jira/browse/HBASE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703712#comment-15703712 ] Matteo Bertozzi commented on HBASE-17189: - +1 > TestMasterObserver#wasModifyTableActionCalled uses wrong variables > -- > > Key: HBASE-17189 > URL: https://issues.apache.org/jira/browse/HBASE-17189 > Project: HBase > Issue Type: Test > Components: test >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17189.v1-master.patch > > > TestMasterObserver#wasModifyTableActionCalled() and > TestMasterObserver#wasModifyTableActionCalledOnly() uses > {{preModifyColumnFamilyActionCalled}} and > {{postCompletedModifyColumnFamilyActionCalled}} members, which are wrong. > Instead it should use {{preModifyTableActionCalled}} and > {{postCompletedModifyTableActionCalled}}. This probably was caused by > copy-and-paste mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17186) MasterProcedureTestingUtility#testRecoveryAndDoubleExecution displays stale procedure state info
[ https://issues.apache.org/jira/browse/HBASE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703569#comment-15703569 ] Matteo Bertozzi commented on HBASE-17186: - +1 on v2 > MasterProcedureTestingUtility#testRecoveryAndDoubleExecution displays stale > procedure state info > > > Key: HBASE-17186 > URL: https://issues.apache.org/jira/browse/HBASE-17186 > Project: HBase > Issue Type: Bug > Components: proc-v2, test >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17186.v1-master.patch, HBASE-17186.v2-master.patch > > > MasterProcedureTestingUtility#testRecoveryAndDoubleExecution get the > procedure information at the beginning of the function, but never updates the > information. As procedure executes and moves to new state, it still log the > stale state information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17186) MasterProcedureTestingUtility#testRecoveryAndDoubleExecution displays stale procedure state info
[ https://issues.apache.org/jira/browse/HBASE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703363#comment-15703363 ] Matteo Bertozzi commented on HBASE-17186: - +1 on the fix, but we have that pattern in other methods. grep "LOG.info("Restart " + i + " to find the others. but you can just do a find/replace with the fix you have there > MasterProcedureTestingUtility#testRecoveryAndDoubleExecution displays stale > procedure state info > > > Key: HBASE-17186 > URL: https://issues.apache.org/jira/browse/HBASE-17186 > Project: HBase > Issue Type: Bug > Components: proc-v2, test >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17186.v1-master.patch > > > MasterProcedureTestingUtility#testRecoveryAndDoubleExecution get the > procedure information at the beginning of the function, but never updates the > information. As procedure executes and moves to new state, it still log the > stale state information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table
[ https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702781#comment-15702781 ] Matteo Bertozzi commented on HBASE-17140: - can you expand more on why we should do this optimization? do you have a use specific case? my guess is that disabling a table is rare enough and extra round trips are not that bad... then if we are going with this patch, why do we even need to store the enabled/disabled table state? if we know that all the regions are offline we know that the table is disabled. and the master can simply compute the state when rebuilding meta on startup. so there is really no need to have that flag stored. also i'm not sure if we have enough coverage to cover the behavior of setting the offline state for disable. at the moment we only use the offline flag for split/merge. and the regions may be back if we rollback. so maybe this is not a trivial change. our code has lots of assumptions and changing things from TableNotEnabledException to RegionOfflineException feels scary to me. > Throw RegionOfflineException directly when request for a disabled table > --- > > Key: HBASE-17140 > URL: https://issues.apache.org/jira/browse/HBASE-17140 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, > HBASE-17140-v3.patch, HBASE-17140-v4.patch, HBASE-17140-v5.patch > > > Now when request for a disabled table, it need 3 rpc calls before fail. > 1. get region location > 2. send call to rs and get NotServeRegionException > 3. retry and check the table state, then throw TableNotEnabledException > The table state check is added for disabled table. But now the prepare method > in RegionServerCallable shows that all retry request will get table state > first. > {code} > public void prepare(final boolean reload) throws IOException { > // check table state if this is a retry > if (reload && !tableName.equals(TableName.META_TABLE_NAME) && > getConnection().isTableDisabled(tableName)) { > throw new TableNotEnabledException(tableName.getNameAsString() + " is > disabled."); > } > try (RegionLocator regionLocator = > connection.getRegionLocator(tableName)) { > this.location = regionLocator.getRegionLocation(row); > } > if (this.location == null) { > throw new IOException("Failed to find location, tableName=" + tableName > + > ", row=" + Bytes.toString(row) + ", reload=" + reload); > } > setStubByServiceName(this.location.getServerName()); > } > {code} > An improvement is set the region offline in HRegionInfo. Then throw the > RegionOfflineException when get region location. > Review board: https://reviews.apache.org/r/54071/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16561) Add metrics about read/write/scan queue length and active read/write/scan handler count
[ https://issues.apache.org/jira/browse/HBASE-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702708#comment-15702708 ] Matteo Bertozzi commented on HBASE-16561: - +1 patch looks good to me > Add metrics about read/write/scan queue length and active read/write/scan > handler count > --- > > Key: HBASE-16561 > URL: https://issues.apache.org/jira/browse/HBASE-16561 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, metrics >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16561-v1.patch, HBASE-16561.patch > > > Now there are only metrics about total queue length and active rpc handler > count. But in the RWQueueRpcExecutor, there are different queues and handlers > for read/write/scan request. I thought it is necessary to add more metrics > for RWQueueRpcExecutor. When use it in production cluster, we can adjust the > config of queues and handlers according to the metrics. > Review url: https://reviews.apache.org/r/54072/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v6.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524-v4.patch, HBASE-16524-v5.patch, HBASE-16524-v6.patch, > HBASE-16524.master.001.patch, flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v5.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524-v4.patch, HBASE-16524-v5.patch, HBASE-16524.master.001.patch, > flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v4.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524-v4.patch, HBASE-16524.master.001.patch, flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: (was: HBASE-16524-v4.patch) > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524.master.001.patch, flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v4.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524-v4.patch, HBASE-16524.master.001.patch, flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17149) Procedure v2 - Fix nonce submission
Matteo Bertozzi created HBASE-17149: --- Summary: Procedure v2 - Fix nonce submission Key: HBASE-17149 URL: https://issues.apache.org/jira/browse/HBASE-17149 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 1.2.4, 1.1.7, 2.0.0, 1.3.0, 1.4.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi instead of having all the logic in submitProcedure(), split in registerNonce() + submitProcedure(). In this case we can avoid calling the coprocessor twice and having a clean submit logic knowing that there will only be one submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v3.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524-v3.patch, > HBASE-16524.master.001.patch, flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17148) Procedure v2 - add bulk proc submit
[ https://issues.apache.org/jira/browse/HBASE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17148: Attachment: HBASE-17148-v0.patch > Procedure v2 - add bulk proc submit > --- > > Key: HBASE-17148 > URL: https://issues.apache.org/jira/browse/HBASE-17148 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17148-v0.patch > > > Add the ability to submit multiple procedure as a single operation. useful > for the AM to reduce some lock/unlock/wait times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17148) Procedure v2 - add bulk proc submit
Matteo Bertozzi created HBASE-17148: --- Summary: Procedure v2 - add bulk proc submit Key: HBASE-17148 URL: https://issues.apache.org/jira/browse/HBASE-17148 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 2.0.0 Add the ability to submit multiple procedure as a single operation. useful for the AM to reduce some lock/unlock/wait times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16935) deleteColumn/modifyTable don't delete all family's StoreFile from file system
[ https://issues.apache.org/jira/browse/HBASE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15683788#comment-15683788 ] Matteo Bertozzi commented on HBASE-16935: - I think we should fix it. if no one has time to work on it, I'll take it as soon I have a bit of time. > deleteColumn/modifyTable don't delete all family's StoreFile from file system > - > > Key: HBASE-16935 > URL: https://issues.apache.org/jira/browse/HBASE-16935 > Project: HBase > Issue Type: New Feature > Components: Admin >Affects Versions: 1.2.3 >Reporter: Mikhail Zvagelsky >Priority: Minor > Attachments: Selection_008.png > > > The method deleteColumn(TableName tableName, byte[] columnName) of the class > org.apache.hadoop.hbase.client.Admin shoud delete specified column family > from specified table. (Despite of its name the method removes the family, not > a column - view the [issue| > https://issues.apache.org/jira/browse/HBASE-1989].) > This method changes the table's schema, but it doesn't delete column family's > Store File from a file system. To be precise - I run this code: > {code:|borderStyle=solid} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.TableName; > import org.apache.hadoop.hbase.client.*; > import org.apache.hadoop.hbase.util.Bytes; > public class ToHBaseIssueTracker { > public static void main(String[] args) throws IOException { > TableName tableName = TableName.valueOf("test_table"); > HTableDescriptor desc = new HTableDescriptor(tableName); > desc.addFamily(new HColumnDescriptor("cf1")); > desc.addFamily(new HColumnDescriptor("cf2")); > Configuration conf = HBaseConfiguration.create(); > Connection connection = ConnectionFactory.createConnection(conf); > Admin admin = connection.getAdmin(); > admin.createTable(desc); > HTable table = new HTable(conf, "test_table"); > for (int i = 0; i < 4; i++) { > Put put = new Put(Bytes.toBytes(i)); // Use i as row key. > put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > put.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > table.put(put); > } > admin.deleteColumn(tableName, Bytes.toBytes("cf2")); > admin.majorCompact(tableName); > admin.close(); > } > } > {code} > Then I see that the store file for the "cf2" family persists in file system. > I observe this effect in standalone hbase installation and in > pseudo-distributed mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16935) deleteColumn/modifyTable don't delete all family's StoreFile from file system
[ https://issues.apache.org/jira/browse/HBASE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16935: Summary: deleteColumn/modifyTable don't delete all family's StoreFile from file system (was: Java API method Admin.deleteColumn(table, columnFamily) doesn't delete family's StoreFile from file system.) > deleteColumn/modifyTable don't delete all family's StoreFile from file system > - > > Key: HBASE-16935 > URL: https://issues.apache.org/jira/browse/HBASE-16935 > Project: HBase > Issue Type: New Feature > Components: Admin >Affects Versions: 1.2.3 >Reporter: Mikhail Zvagelsky > Attachments: Selection_008.png > > > The method deleteColumn(TableName tableName, byte[] columnName) of the class > org.apache.hadoop.hbase.client.Admin shoud delete specified column family > from specified table. (Despite of its name the method removes the family, not > a column - view the [issue| > https://issues.apache.org/jira/browse/HBASE-1989].) > This method changes the table's schema, but it doesn't delete column family's > Store File from a file system. To be precise - I run this code: > {code:|borderStyle=solid} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.TableName; > import org.apache.hadoop.hbase.client.*; > import org.apache.hadoop.hbase.util.Bytes; > public class ToHBaseIssueTracker { > public static void main(String[] args) throws IOException { > TableName tableName = TableName.valueOf("test_table"); > HTableDescriptor desc = new HTableDescriptor(tableName); > desc.addFamily(new HColumnDescriptor("cf1")); > desc.addFamily(new HColumnDescriptor("cf2")); > Configuration conf = HBaseConfiguration.create(); > Connection connection = ConnectionFactory.createConnection(conf); > Admin admin = connection.getAdmin(); > admin.createTable(desc); > HTable table = new HTable(conf, "test_table"); > for (int i = 0; i < 4; i++) { > Put put = new Put(Bytes.toBytes(i)); // Use i as row key. > put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > put.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > table.put(put); > } > admin.deleteColumn(tableName, Bytes.toBytes("cf2")); > admin.majorCompact(tableName); > admin.close(); > } > } > {code} > Then I see that the store file for the "cf2" family persists in file system. > I observe this effect in standalone hbase installation and in > pseudo-distributed mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16935) Java API method Admin.deleteColumn(table, columnFamily) doesn't delete family's StoreFile from file system.
[ https://issues.apache.org/jira/browse/HBASE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677463#comment-15677463 ] Matteo Bertozzi commented on HBASE-16935: - from the code we actually remove the family dirs in both cases (table enabled or disabled). but since we reopen the regions after deleting the family folders we end up with the flushed on-close file in the deleted family. so, all the files in the family except the flush on-close one will be removed. basically the simplified order of operation we have (since 0.94 or even before) is: - change the htd - drop the family folder - re-open regions (close region will trigger a flush of the family) we should probably flip the order, so we reopen with the new htd and then we remove the dirs. > Java API method Admin.deleteColumn(table, columnFamily) doesn't delete > family's StoreFile from file system. > --- > > Key: HBASE-16935 > URL: https://issues.apache.org/jira/browse/HBASE-16935 > Project: HBase > Issue Type: New Feature > Components: Admin >Affects Versions: 1.2.3 >Reporter: Mikhail Zvagelsky > Attachments: Selection_008.png > > > The method deleteColumn(TableName tableName, byte[] columnName) of the class > org.apache.hadoop.hbase.client.Admin shoud delete specified column family > from specified table. (Despite of its name the method removes the family, not > a column - view the [issue| > https://issues.apache.org/jira/browse/HBASE-1989].) > This method changes the table's schema, but it doesn't delete column family's > Store File from a file system. To be precise - I run this code: > {code:|borderStyle=solid} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.TableName; > import org.apache.hadoop.hbase.client.*; > import org.apache.hadoop.hbase.util.Bytes; > public class ToHBaseIssueTracker { > public static void main(String[] args) throws IOException { > TableName tableName = TableName.valueOf("test_table"); > HTableDescriptor desc = new HTableDescriptor(tableName); > desc.addFamily(new HColumnDescriptor("cf1")); > desc.addFamily(new HColumnDescriptor("cf2")); > Configuration conf = HBaseConfiguration.create(); > Connection connection = ConnectionFactory.createConnection(conf); > Admin admin = connection.getAdmin(); > admin.createTable(desc); > HTable table = new HTable(conf, "test_table"); > for (int i = 0; i < 4; i++) { > Put put = new Put(Bytes.toBytes(i)); // Use i as row key. > put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > put.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("a"), > Bytes.toBytes("value")); > table.put(put); > } > admin.deleteColumn(tableName, Bytes.toBytes("cf2")); > admin.majorCompact(tableName); > admin.close(); > } > } > {code} > Then I see that the store file for the "cf2" family persists in file system. > I observe this effect in standalone hbase installation and in > pseudo-distributed mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17088) Refactor RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor
[ https://issues.apache.org/jira/browse/HBASE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674322#comment-15674322 ] Matteo Bertozzi commented on HBASE-17088: - +1 > Refactor RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor > > > Key: HBASE-17088 > URL: https://issues.apache.org/jira/browse/HBASE-17088 > Project: HBase > Issue Type: Improvement > Components: rpc >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17088-v1.patch, HBASE-17088-v2.patch, > HBASE-17088-v3.patch, HBASE-17088-v3.patch, HBASE-17088-v4.patch, > HBASE-17088-v4.patch > > > 1. The RWQueueRpcExecutor has eight constructor method and the longest one > has ten parameters. But It is only used in SimpleRpcScheduler and easy to > confused when read the code. > 2. There are duplicate method implement in RWQueueRpcExecutor and > BalancedQueueRpcExecutor. They can be implemented in their parent class > RpcExecutor. > 3. SimpleRpcScheduler read many configs to new RpcExecutor. But the > CALL_QUEUE_SCAN_SHARE_CONF_KEY is only needed by RWQueueRpcExecutor. And > CALL_QUEUE_CODEL_TARGET_DELAY, CALL_QUEUE_CODEL_INTERVAL and > CALL_QUEUE_CODEL_LIFO_THRESHOLD are only needed by AdaptiveLifoCoDelCallQueue. > So I thought we can refactor it. Suggestions are welcome. > Review board: https://reviews.apache.org/r/53726/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions
[ https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5583: --- Resolution: Not A Problem Fix Version/s: (was: 2.0.0) Status: Resolved (was: Patch Available) This got fixed by proc-v2 HBASE-13203, where now create table and the other DDLs are using a state machine and on restart they resume from where they left off with all the information they need to complete the operation > Master restart on create table with splitkeys does not recreate table with > all the splitkey regions > --- > > Key: HBASE-5583 > URL: https://issues.apache.org/jira/browse/HBASE-5583 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, > HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, > HBASE-5583_new_5_WIP_using_tableznode.patch > > > -> Create table using splitkeys > -> MAster goes down before all regions are added to meta > -> On master restart the table is again enabled but with less number of > regions than specified in splitkeys > Anyway client will get an exception if i had called sync create table. But > table exists or not check will say table exists. > Is this scenario to be handled by client only or can we have some mechanism > on the master side for this? Pls suggest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17104) Improve cryptic error message "Memstore size is" on region close
Matteo Bertozzi created HBASE-17104: --- Summary: Improve cryptic error message "Memstore size is" on region close Key: HBASE-17104 URL: https://issues.apache.org/jira/browse/HBASE-17104 Project: HBase Issue Type: Bug Components: regionserver Reporter: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 while grepping my RS log for ERROR I found a cryptic {noformat} ERROR [RS_CLOSE_REGION-u1604vm:35021-1] regionserver.HRegion(1601): Memstore size is 33744 {noformat} from the code looks like we seems to want to notify the user about the fact that on close the rs was not able to flush and there were things in the RS. https://github.com/apache/hbase/blob/c3685760f004450667920144f926383eb307de53/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L1601 {code} if (!canFlush) { this.decrMemstoreSize(new MemstoreSize(memstoreDataSize.get(), getMemstoreHeapOverhead())); } else if (memstoreDataSize.get() != 0) { LOG.error("Memstore size is " + memstoreDataSize.get()); } {code} this should probably not even be an error but a warn or even info, unless we have puts that specifically asked to not be written to the wal, otherwise the data in the memstore should be safe in the wals. In any case it will be nice to have a message describing what is going on and why we are notifying about the memstore size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17090) Procedure v2 - fast wake if nothing else is running
[ https://issues.apache.org/jira/browse/HBASE-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17090: Attachment: HBASE-17090-v0.patch > Procedure v2 - fast wake if nothing else is running > --- > > Key: HBASE-17090 > URL: https://issues.apache.org/jira/browse/HBASE-17090 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17090-v0.patch > > > We wait Nmsec to see if we can batch more procedures, but the pattern that we > have allows us to wait only for what we know is running and avoid waiting for > something that will never get there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17090) Procedure v2 - fast wake if nothing else is running
Matteo Bertozzi created HBASE-17090: --- Summary: Procedure v2 - fast wake if nothing else is running Key: HBASE-17090 URL: https://issues.apache.org/jira/browse/HBASE-17090 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 2.0.0 We wait Nmsec to see if we can batch more procedures, but the pattern that we have allows us to wait only for what we know is running and avoid waiting for something that will never get there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17068: Attachment: HBASE-17068-v1.patch > Procedure v2 - inherit region locks > > > Key: HBASE-17068 > URL: https://issues.apache.org/jira/browse/HBASE-17068 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17068-v0.patch, HBASE-17068-v1.patch > > > Add support for inherited region locks. > e.g. Split will have Assign/Unassign as child which will take the lock on the > same region split is running on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17068: Status: Patch Available (was: Open) > Procedure v2 - inherit region locks > > > Key: HBASE-17068 > URL: https://issues.apache.org/jira/browse/HBASE-17068 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17068-v0.patch > > > Add support for inherited region locks. > e.g. Split will have Assign/Unassign as child which will take the lock on the > same region split is running on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17068: Attachment: HBASE-17068-v0.patch > Procedure v2 - inherit region locks > > > Key: HBASE-17068 > URL: https://issues.apache.org/jira/browse/HBASE-17068 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17068-v0.patch > > > Add support for inherited region locks. > e.g. Split will have Assign/Unassign as child which will take the lock on the > same region split is running on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17068) Procedure v2 - inherit region locks
Matteo Bertozzi created HBASE-17068: --- Summary: Procedure v2 - inherit region locks Key: HBASE-17068 URL: https://issues.apache.org/jira/browse/HBASE-17068 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Add support for inherited region locks. e.g. Split will have Assign/Unassign as child which will take the lock on the same region split is running on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17067) Procedure v2 - remove zklock/tryLock and use wait/wake
Matteo Bertozzi created HBASE-17067: --- Summary: Procedure v2 - remove zklock/tryLock and use wait/wake Key: HBASE-17067 URL: https://issues.apache.org/jira/browse/HBASE-17067 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 2.0.0 Once we have HBASE-16744, HBASE-16786, HBASE-16831. we can remove the tryLock() methods and replace them with the wait/wake methods that are using the framework events instead of spinning until we can start the proc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17066) Procedure v2 - Add handling of merge region transition to the new AM
Matteo Bertozzi created HBASE-17066: --- Summary: Procedure v2 - Add handling of merge region transition to the new AM Key: HBASE-17066 URL: https://issues.apache.org/jira/browse/HBASE-17066 Project: HBase Issue Type: Sub-task Components: proc-v2, Region Assignment Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 2.0.0 Core Assignment HBASE-14614 does not handle merge in reportRegionStateTransition(). Handle the transition request! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16548) Procedure v2 - Add handling of split region transition to the new AM
[ https://issues.apache.org/jira/browse/HBASE-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16548: Summary: Procedure v2 - Add handling of split region transition to the new AM (was: Procedure v2 - Add handling of split/merge region transition to the new AM) > Procedure v2 - Add handling of split region transition to the new AM > > > Key: HBASE-16548 > URL: https://issues.apache.org/jira/browse/HBASE-16548 > Project: HBase > Issue Type: Sub-task > Components: proc-v2, Region Assignment >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > > Core Assignment HBASE-14614 does not handle split in > reportRegionStateTransition(). Handle the transition request! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16548) Procedure v2 - Add handling of split/merge region transition to the new AM
[ https://issues.apache.org/jira/browse/HBASE-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16548: Description: Core Assignment HBASE-14614 does not handle split in reportRegionStateTransition(). Handle the transition request! (was: Core Assignment HBASE-14614 does not handle split and merge in reportRegionStateTransition(). Handle the transition request!) > Procedure v2 - Add handling of split/merge region transition to the new AM > -- > > Key: HBASE-16548 > URL: https://issues.apache.org/jira/browse/HBASE-16548 > Project: HBase > Issue Type: Sub-task > Components: proc-v2, Region Assignment >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > > Core Assignment HBASE-14614 does not handle split in > reportRegionStateTransition(). Handle the transition request! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Description: Fix performance regression introduced by HBASE-16094. Instead of scanning all the wals every time, we can rely on the insert/update/delete events we have. and since we want to delete the wals in order we can keep track of what is "holding" that wal, and take a hit on scanning all the trackers only when we remove the first log in the queue. e.g. WAL-1 [1, 2] WAL-2 [1] -> "[2] is holding WAL-1" WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" was:Fix performance regression introduced by HBASE-16094. > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > > Fix performance regression introduced by HBASE-16094. > Instead of scanning all the wals every time, we can rely on the > insert/update/delete events we have. > and since we want to delete the wals in order we can keep track of what is > "holding" that wal, and take a hit on scanning all the trackers only when we > remove the first log in the queue. > e.g. > WAL-1 [1, 2] > WAL-2 [1] -> "[2] is holding WAL-1" > WAL-3 [2] -> "WAL 1 can be removed, recompute what is holding WAL-2" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Description: Fix performance regression introduced by HBASE-16094. > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > > Fix performance regression introduced by HBASE-16094. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Affects Version/s: 2.0.0 Fix Version/s: 2.0.0 Component/s: proc-v2 > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-16524: --- Assignee: Matteo Bertozzi (was: Appy) > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Summary: Procedure v2 - Compute WALs cleanup on wal modification and not on every sync (was: Clean procedure wal periodically instead of on every sync) > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task >Reporter: Appy >Assignee: Appy >Priority: Minor > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16524) Procedure v2 - Compute WALs cleanup on wal modification and not on every sync
[ https://issues.apache.org/jira/browse/HBASE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16524: Attachment: HBASE-16524-v2.patch > Procedure v2 - Compute WALs cleanup on wal modification and not on every sync > - > > Key: HBASE-16524 > URL: https://issues.apache.org/jira/browse/HBASE-16524 > Project: HBase > Issue Type: Sub-task >Reporter: Appy >Assignee: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-16524-v2.patch, HBASE-16524.master.001.patch, > flame1.svg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch, HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17042) Remove 'public' keyword from MasterObserver interface
[ https://issues.apache.org/jira/browse/HBASE-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645768#comment-15645768 ] Matteo Bertozzi commented on HBASE-17042: - +1 > Remove 'public' keyword from MasterObserver interface > - > > Key: HBASE-17042 > URL: https://issues.apache.org/jira/browse/HBASE-17042 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17042.v1-master.patch > > > Very minor, when I added some new observers, I put 'public' is in some new > observers in the {{MasterObserver}} interface. The fix is trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Attachment: HBASE-17030-v0.patch > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch, HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16990) Shell tool to dump table schemas and table privileges
[ https://issues.apache.org/jira/browse/HBASE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644394#comment-15644394 ] Matteo Bertozzi commented on HBASE-16990: - see HBASE-11013. there was a discussion about retaining ACLs with snapshots > Shell tool to dump table schemas and table privileges > - > > Key: HBASE-16990 > URL: https://issues.apache.org/jira/browse/HBASE-16990 > Project: HBase > Issue Type: New Feature > Components: tooling >Reporter: huzheng >Assignee: huzheng >Priority: Minor > > Recently, we are trying to migrate tables from Cluster-A to Cluster-B, I > found that HBase lack some useful tools : > 1. dump table schema, like mysqldump in mysql > 2. dump table privileges, like pt-show-grants in mysql provided by Percona. > I think we can add a dump sub-command looks like (JUST simple demo) : > {code} > $ ./bin/hbase dump -t test_table --with-privileges > ~/test_table.hsh > $ cat ~/test_table.hsh > create 'test_table', {NAME=>'f1'} > grant 'test_user', 'RW', 'test_table' > {code} > Maybe I can contribute ... :) > How do you think ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-11013) Clone Snapshots on Secure Cluster Should provide option to apply Retained User Permissions
[ https://issues.apache.org/jira/browse/HBASE-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-11013: --- Assignee: (was: Matteo Bertozzi) > Clone Snapshots on Secure Cluster Should provide option to apply Retained > User Permissions > -- > > Key: HBASE-11013 > URL: https://issues.apache.org/jira/browse/HBASE-11013 > Project: HBase > Issue Type: Improvement > Components: snapshots >Reporter: Ted Yu > > Currently, > {code} > sudo su - test_user > create 't1', 'f1' > sudo su - hbase > snapshot 't1', 'snap_one' > clone_snapshot 'snap_one', 't2' > {code} > In this scenario the user - test_user would not have permissions for the > clone table t2. > We need to add improvement feature such that the permissions of the original > table are recorded in snapshot metadata and an option is provided for > applying them to the new table as part of the clone process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Resolution: Fixed Fix Version/s: 1.1.8 Status: Resolved (was: Patch Available) > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Fix For: 1.1.8 > > Attachments: HBASE-17022-v0.branch-1.1.patch, > HBASE-17022-v0_branch-1.1.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Status: Patch Available (was: Open) > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Attachment: HBASE-17030-v0.patch > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi resolved HBASE-17029. - Resolution: Duplicate double click created two HBASE-17029/HBASE-17030. closing this one > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17029 > URL: https://issues.apache.org/jira/browse/HBASE-17029 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
Matteo Bertozzi created HBASE-17030: --- Summary: Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure Key: HBASE-17030 URL: https://issues.apache.org/jira/browse/HBASE-17030 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 Make a couple of tweaks to HBASE-14551 split procedure - remove tableName from SplitTableRegionProcedure ctor since we have the RegionInfo that contains the name already - move the checkRow in the constructor of the SplitTableRegionProcedure, since the splitRow will never change and we can avoid to start the proc if we have a bad splitRow. - use the base AbstractStateMachineTableProcedure for the "user" field - remove protobuf fields that can be extrapolated from other info (table_name, split_row) - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
Matteo Bertozzi created HBASE-17029: --- Summary: Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure Key: HBASE-17029 URL: https://issues.apache.org/jira/browse/HBASE-17029 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 Make a couple of tweaks to HBASE-14551 split procedure - remove tableName from SplitTableRegionProcedure ctor since we have the RegionInfo that contains the name already - move the checkRow in the constructor of the SplitTableRegionProcedure, since the splitRow will never change and we can avoid to start the proc if we have a bad splitRow. - use the base AbstractStateMachineTableProcedure for the "user" field - remove protobuf fields that can be extrapolated from other info (table_name, split_row) - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17023) Region left unassigned due to AM and SSH each thinking others would do the assignment work
[ https://issues.apache.org/jira/browse/HBASE-17023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637638#comment-15637638 ] Matteo Bertozzi commented on HBASE-17023: - make sense to me, +1 > Region left unassigned due to AM and SSH each thinking others would do the > assignment work > -- > > Key: HBASE-17023 > URL: https://issues.apache.org/jira/browse/HBASE-17023 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.1.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-17023.v0-branch-1.1.patch > > > Another Assignment Manager and SSH issue. This issue is similar to > HBASE-13330, except this time the code path goes through ClosedRegionHandler > and we should apply the same fix of HBASE-13330 to ClosedRegionHandler. > Basically, the AssignmentManager thinks the ServerShutdownHandler would > assign the region and the ServerShutdownHandler thinks that the > AssignmentManager would assign the region. The region > (23e0186c4d2b5cc09f25de35fe174417) ultimately never gets assigned. Below is > an analysis from the logs that captures the flow of events. > 1. The AssignmentManager had initially assigned this region to > {{rs42.prod.foo.com,16020,1476293566365}}. > 2. The {{rs42.prod.foo.com,16020,1476293566365}} stops and sends the CLOSE > request to master. > 3. ServerShutdownHandler(SSH) runs to assign this region to > {{rs44.prod.foo.com,16020,1476294287692}}, but assign failed. > 4. When the master restarted it did a scan of the meta to learn about the > regions in the cluster. It found this region still being assigned to > {{rs42} from the meta record. > 5. However, this {{rs42}} server was not alive anymore. So, the > AssignmentManager queued up a ServerShutdownHandling task for this (that > asynchronously executes): > 6. In the meantime, the AssignmentManager proceeded to read the RIT nodes > from ZK. It found this region as well is in RS_ZK_REGION_FAILED_OPEN in the > {{rs44}} RS. > 7. The region was moved to CLOSED state: > {noformat} > 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Handling RS_ZK_REGION_FAILED_OPEN, > server=rs44.prod.foo.com,16020,1476294287692, > region=23e0186c4d2b5cc09f25de35fe174417, > current_state={23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, > ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692} > 2016-10-12 17:45:11,637 INFO [AM.ZK.Worker-pool2-t6] master.RegionStates: > Transition {23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, > ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692} to > {23e0186c4d2b5cc09f25de35fe174417 state=CLOSED, ts=1476294311637, > server=rs44.prod.foo.com,16020,1476294287692} > 2016-10-12 17:45:11,637 WARN [AM.ZK.Worker-pool2-t6] master.RegionStates: > 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on > rs44.prod.foo.com,16020,1476294287692, expected > rs42.prod.foo.com,16020,1476293566365 > {noformat} > 8. After that the AssignmentManager tried to assign it again. However, the > assignment didn't happen because the ServerShutdownHandling task queued > earlier didn't yet execute: > {noformat} > 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Found an existing plan for > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417. > destination server is rs44.prod.foo.com,16020,1476294287692 accepted as a > dest server = false > 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: No previous transition plan found (or ignoring an > existing plan) for > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417.; > generated random > plan=hri=table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417., > src=, dest=rs28.prod.foo.com,16020,1476294291314; 10 (online=11) available > servers, forceNewPlan=true > 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] > handler.ClosedRegionHandler: Handling CLOSED event for > 23e0186c4d2b5cc09f25de35fe174417 > 2016-10-12 17:45:11,697 WARN [AM.ZK.Worker-pool2-t6] master.RegionStates: > 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on > rs44.prod.foo.com,16020,1476294287692, expected > rs42.prod.foo.com,16020,1476293566365 > 2016-10-12 17:45:11,697 INFO [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Skip assigning > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417., > it's host rs42.prod.foo.com,16020,1476293566365 is dead but not processed yet > 2016-10-12 17:45:11,884 INFO [MASTER_SERVER_OPERATIONS-server01:16000-3] > master.RegionStates: Transitioning {23e0186c4d2b5cc09f25de35fe174417 > state=CLOSED, ts=14762943116
[jira] [Updated] (HBASE-16892) Use TableName instead of String in SnapshotDescription
[ https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16892: Resolution: Fixed Status: Resolved (was: Patch Available) > Use TableName instead of String in SnapshotDescription > -- > > Key: HBASE-16892 > URL: https://issues.apache.org/jira/browse/HBASE-16892 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, > HBASE-16892-v2.patch > > > mostly find & replace work: > deprecate the SnapshotDescription constructors with the String argument in > favor of the TableName ones. > Replace the TableName.valueOf() around with the new getTableName() > Replace the TableName.getNameAsString() by just passing the TableName -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Labels: snapshot (was: ) > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi resolved HBASE-16937. - Resolution: Fixed > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Labels: (was: snapshot) > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Component/s: snapshots > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16865) Procedure v2 - Inherit lock from root proc
[ https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16865: Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure v2 - Inherit lock from root proc > -- > > Key: HBASE-16865 > URL: https://issues.apache.org/jira/browse/HBASE-16865 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-16865-v0.patch > > > At the moment we support inheriting locks from the parent procedure for a 2 > level procedures, but in case of reopen table regions we have a 3 level > procedures (ModifyTable -> ReOpen -> [Unassign/Assign]) and reopen does not > have any locks on its own. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Comment: was deleted (was: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 16s {color} | {color:red} HBASE-17022 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837155/HBASE-17022-v0.patch | | JIRA Issue | HBASE-17022 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4329/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. ) > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0_branch-1.1.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Attachment: HBASE-17022-v0_branch-1.1.patch > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0_branch-1.1.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Attachment: (was: HBASE-17022-v0.patch) > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0_branch-1.1.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636635#comment-15636635 ] Matteo Bertozzi commented on HBASE-17022: - yeah forgot to update the test in that branch, to reload the set of regions. that after HBASE-16649 will get a different name after truncate > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Status: Patch Available (was: Open) > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Attachment: HBASE-17022-v0.patch > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Attachments: HBASE-17022-v0.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-17022: --- Assignee: Matteo Bertozzi > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16865) Procedure v2 - Inherit lock from root proc
[ https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627405#comment-15627405 ] Matteo Bertozzi commented on HBASE-16865: - root may just be a "launcher" like Move/Balance don't take a lock they just start N children > Procedure v2 - Inherit lock from root proc > -- > > Key: HBASE-16865 > URL: https://issues.apache.org/jira/browse/HBASE-16865 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-16865-v0.patch > > > At the moment we support inheriting locks from the parent procedure for a 2 > level procedures, but in case of reopen table regions we have a 3 level > procedures (ModifyTable -> ReOpen -> [Unassign/Assign]) and reopen does not > have any locks on its own. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16865) Procedure v2 - Inherit lock from root proc
[ https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627320#comment-15627320 ] Matteo Bertozzi commented on HBASE-16865: - we don't have the ability to go back recursively to the root at the moment. and we only have two use cases for now child that want to lock what the parent have and the child far down the line that wants to lock what the root has. I don't want to make this too generic for now and end up with strange deep procs at the moment. I prefer having people complaining about the inability to do something and figure out if there is a better way to do it, instead of allowing strange behaviors. > Procedure v2 - Inherit lock from root proc > -- > > Key: HBASE-16865 > URL: https://issues.apache.org/jira/browse/HBASE-16865 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-16865-v0.patch > > > At the moment we support inheriting locks from the parent procedure for a 2 > level procedures, but in case of reopen table regions we have a 3 level > procedures (ModifyTable -> ReOpen -> [Unassign/Assign]) and reopen does not > have any locks on its own. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16959) Export snapshot to local file system of a single node
[ https://issues.apache.org/jira/browse/HBASE-16959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621633#comment-15621633 ] Matteo Bertozzi commented on HBASE-16959: - the -D is an option of the tool not the jvm, so you should pass it as: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -D... -snapshot... as alternative of mapreduce.jobtracker.address you can use "-jt local" which is interpreted by the tool. and always used as tool argument so hbase org...ExportSnapshot -jt local -snapshot... https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java#L102 https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java#L273 > Export snapshot to local file system of a single node > - > > Key: HBASE-16959 > URL: https://issues.apache.org/jira/browse/HBASE-16959 > Project: HBase > Issue Type: New Feature > Components: snapshots >Reporter: Xiang Li >Priority: Critical > > ExportSnapshot allows uses to specify "file://" in "copy-to". > Based on the implementation (use Map jobs), it works as follow: > (1) The manifest of the snapshot(.hbase-snapshot) is exported to the local > file system of the HBase client node where the command is issued > (2) The data of the snapshot(archive) is exported to the local file system > of the nodes where the map jobs run, so spread everywhere. > *That causes 2 problems we meet so far:* > (1) The last step to verify the snapshot integrity fails, due to that not all > the data can be found on the HBase client node where the command is issued. > "-no-target-verify" can be of help here to suppress the verification, but it > is not a good idea > (2) When the HBase client (where the command is issued) is also a NodeManager > of Yarn, and it happens to have a map job (to write data of snapshot) running > on it, the "copy-to" directory will be created firstly when writing the > manifest by user=hbase and then user=yarn(if it is not controlled) will try > to write data into it. If the directory permission is not set properly, let > say, umask = 022, both hbase and yarn are in hadoop group, the "copy-to" is > created with no write permission(777-022=755, so rwxr-xr-x) for the same > group, user=yarn can not write data into the "copy-to" directory, as it is > created by user=hbase. We have the following exception > {code} > Error: java.io.IOException: Mkdirs failed to create > file:/tmp/snap_export/archive/data/default/table_xxx/regionid_xxx/info > (exists=false, > cwd=file:/hadoop/yarn/local/usercache/hbase/appcache/application_1477577812726_0001/container_1477577812726_0001_01_04) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:275) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:193) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:119) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {code} > We can control the permission to resolve that, but it is not a good idea > either. > *Proposal* > If exporting to "file://", add reduce to aggregate all "distributed" data of > the snapshot to the HBase client node where the command is issued, to be > together with the manifest of the snapshot. That can resolve the verification > problem above in (1) > For problem (2), have no idea so far -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16959) Export snapshot to local file system of a single node
[ https://issues.apache.org/jira/browse/HBASE-16959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614191#comment-15614191 ] Matteo Bertozzi commented on HBASE-16959: - why not just run with -Dmapreduce.jobtracker.address=local, so the ExportSnapshot will run on the same machine and export locally? > Export snapshot to local file system of a single node > - > > Key: HBASE-16959 > URL: https://issues.apache.org/jira/browse/HBASE-16959 > Project: HBase > Issue Type: New Feature > Components: snapshots >Reporter: Xiang Li >Priority: Critical > > ExportSnapshot allows uses to specify "file://" in "copy-to". > Based on the implementation (use Map jobs), it works as follow: > (1) The manifest of the snapshot(.hbase-snapshot) is exported to the local > file system of the HBase client node where the command is issued > (2) The data of the snapshot(archive) is exported to the local file system > of the nodes where the map jobs run, so spread everywhere. > *That causes 2 problems we meet so far:* > (1) The last step to verify the snapshot integrity fails, due to that not all > the data can be found on the HBase client node where the command is issued. > "-no-target-verify" can be of help here to suppress the verification, but it > is not a good idea > (2) When the HBase client (where the command is issued) is also a NodeManager > of Yarn, and it happens to have a map job (to write data of snapshot) running > on it, the "copy-to" directory will be created firstly when writing the > manifest by user=hbase and then user=yarn(if it is not controlled) will try > to write data into it. If the directory permission is not set properly, let > say, umask = 022, both hbase and yarn are in hadoop group, the "copy-to" is > created with no write permission(777-022=755, so rwxr-xr-x) for the same > group, user=yarn can not write data into the "copy-to" directory, as it is > created by user=hbase. We have the following exception > {code} > Error: java.io.IOException: Mkdirs failed to create > file:/tmp/snap_export/archive/data/default/table_xxx/regionid_xxx/info > (exists=false, > cwd=file:/hadoop/yarn/local/usercache/hbase/appcache/application_1477577812726_0001/container_1477577812726_0001_01_04) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:275) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:193) > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:119) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {code} > We can control the permission to resolve that, but it is not a good idea > either. > *Proposal* > If exporting to "file://", add reduce to aggregate all "distributed" data of > the snapshot to the HBase client node where the command is issued, to be > together with the manifest of the snapshot. That can resolve the verification > problem above in (1) > For problem (2), have no idea so far -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16950) Print raw stats in the end of procedure performance tools for parsing results from scripts
[ https://issues.apache.org/jira/browse/HBASE-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610629#comment-15610629 ] Matteo Bertozzi commented on HBASE-16950: - +1 > Print raw stats in the end of procedure performance tools for parsing results > from scripts > -- > > Key: HBASE-16950 > URL: https://issues.apache.org/jira/browse/HBASE-16950 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Trivial > Attachments: HBASE-16950.master.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16939: Fix Version/s: (was: 1.2.4) 1.2.5 > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8 > > Attachments: HBASE-16939-v1.patch, HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16939: Resolution: Fixed Fix Version/s: 1.1.8 1.2.4 1.3.1 1.4.0 2.0.0 Status: Resolved (was: Patch Available) > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.4, 1.1.8 > > Attachments: HBASE-16939-v1.patch, HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606500#comment-15606500 ] Matteo Bertozzi commented on HBASE-16939: - committed on 1.1, 1.2, branch-1, master. holding on 1.3 since we are trying to get a release, i'll commit it later for 1.3.1 (for 0.98, open a backport jira for both HBASE-14445 and this one if you need this stuff in 0.98) > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16939-v1.patch, HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605423#comment-15605423 ] Matteo Bertozzi commented on HBASE-16937: - yeah nothing changed. we are still using the non protobuf enum. this patch only removes that extra conversion protobufToEnum() that we have around since the beginning when we did not have the pojo object > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604077#comment-15604077 ] Matteo Bertozzi commented on HBASE-16939: - +1. I guess we can apply the patch to every active branch, right? > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16939-v1.patch, HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604011#comment-15604011 ] Matteo Bertozzi commented on HBASE-16939: - can you add a LOG.warn() or LOG.info() informing that we are creating and setting permission on the .hbase-snapshot dir? I'd like to see in the log that export snapshot created that dir. other than that little addition, patch looks good to me. > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
[ https://issues.apache.org/jira/browse/HBASE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603974#comment-15603974 ] Matteo Bertozzi commented on HBASE-16939: - if the .hbase-snapshot dir does not exist, and we are exporting to another cluster will the cluster be able to take snapshots since the dir permission is set to the export user and not to the hbase user? > ExportSnapshot: set owner and permission on right directory > --- > > Key: HBASE-16939 > URL: https://issues.apache.org/jira/browse/HBASE-16939 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-16939.patch > > > {code} > FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, > false, false, conf); > if (filesUser != null || filesGroup != null) { > setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); > } > if (filesMode > 0) { > setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); > } > {code} > It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on > snapshotTmpDir. They are different directory when skipTmp is true. > Another problem is new cluster doesn't have .hbase-snapshot directory. So > after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16892) Use TableName instead of String in SnapshotDescription
[ https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16892: Attachment: HBASE-16892-v2.patch > Use TableName instead of String in SnapshotDescription > -- > > Key: HBASE-16892 > URL: https://issues.apache.org/jira/browse/HBASE-16892 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, > HBASE-16892-v2.patch > > > mostly find & replace work: > deprecate the SnapshotDescription constructors with the String argument in > favor of the TableName ones. > Replace the TableName.valueOf() around with the new getTableName() > Replace the TableName.getNameAsString() by just passing the TableName -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Status: Open (was: Patch Available) > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603419#comment-15603419 ] Matteo Bertozzi commented on HBASE-16937: - patch depends on HBASE-16892 > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Attachment: HBASE-16937-v1.patch > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)