[jira] [Updated] (HBASE-7403) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7403: Attachment: hbase-7403-trunkv9.patch Addressing Ted's comments, adding a test for concurrent region splitting and region merging scenario Online Merge Key: HBASE-7403 URL: https://issues.apache.org/jira/browse/HBASE-7403 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t be rolled back. It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server
[ https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7506: Attachment: 7506-trunkv1.patch Judgment of carrying ROOT/META will become wrong when expiring server - Key: HBASE-7506 URL: https://issues.apache.org/jira/browse/HBASE-7506 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch, 7506-trunkv2.patch We will check whether server carrying ROOT/META when expiring the server. See ServerManager#expireServer. If the dead server carrying META, we assign meta directly in the process of ServerShutdownHandler. If the dead server carrying ROOT, we will offline ROOT and then verifyAndAssignRootWithRetries() How judgement of carrtying ROOT/META become wrong? If region is in RIT, and isCarryingRegion() return true after addressing from zk. However, once RIT time out(could be caused by this.allRegionServersOffline !noRSAvailable, see AssignmentManager#TimeoutMonitor) and we assign it to otherwhere, this judgement become wrong. See AssignmentManager#isCarryingRegion for details With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META twice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server
[ https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7506: Status: Patch Available (was: Open) Judgment of carrying ROOT/META will become wrong when expiring server - Key: HBASE-7506 URL: https://issues.apache.org/jira/browse/HBASE-7506 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch, 7506-trunkv2.patch We will check whether server carrying ROOT/META when expiring the server. See ServerManager#expireServer. If the dead server carrying META, we assign meta directly in the process of ServerShutdownHandler. If the dead server carrying ROOT, we will offline ROOT and then verifyAndAssignRootWithRetries() How judgement of carrtying ROOT/META become wrong? If region is in RIT, and isCarryingRegion() return true after addressing from zk. However, once RIT time out(could be caused by this.allRegionServersOffline !noRSAvailable, see AssignmentManager#TimeoutMonitor) and we assign it to otherwhere, this judgement become wrong. See AssignmentManager#isCarryingRegion for details With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META twice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549425#comment-13549425 ] chunhui shen commented on HBASE-7504: - Patch v2 committed to trunk,0.94 branch -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7504: Fix Version/s: 0.94.5 -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7504: Attachment: 7504-94.patch -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7505) Server will hang when stopping cluster, caused by waiting for split threads
[ https://issues.apache.org/jira/browse/HBASE-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7505: Resolution: Fixed Fix Version/s: (was: 0.94.4) 0.94.5 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Server will hang when stopping cluster, caused by waiting for split threads --- Key: HBASE-7505 URL: https://issues.apache.org/jira/browse/HBASE-7505 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7505-94.patch, 7505-trunk v1.patch We will retry 100 times (about 3200 minitues) for HRegionServer#postOpenDeployTasks now, see HConnectionManager#setServerSideHConnectionRetries. However, when we stopping the cluster, we will wait for split threads in HRegionServer#join, if META/ROOT server has already been stopped, the split thread won't exit because it is in the retrying for HRegionServer#postOpenDeployTasks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7506) Judgment of carrying ROOT/META will become wrong when expiring server
[ https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549435#comment-13549435 ] Hadoop QA commented on HBASE-7506: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564138/7506-trunkv1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransaction {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3958//console This message is automatically generated. Judgment of carrying ROOT/META will become wrong when expiring server - Key: HBASE-7506 URL: https://issues.apache.org/jira/browse/HBASE-7506 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7506-trunk v1.patch, 7506-trunkv1.patch, 7506-trunkv2.patch We will check whether server carrying ROOT/META when expiring the server. See ServerManager#expireServer. If the dead server carrying META, we assign meta directly in the process of ServerShutdownHandler. If the dead server carrying ROOT, we will offline ROOT and then verifyAndAssignRootWithRetries() How judgement of carrtying ROOT/META become wrong? If region is in RIT, and isCarryingRegion() return true after addressing from zk. However, once RIT time out(could be caused by this.allRegionServersOffline !noRSAvailable, see AssignmentManager#TimeoutMonitor) and we assign it to otherwhere, this judgement become wrong. See AssignmentManager#isCarryingRegion for details With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META twice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549436#comment-13549436 ] Hadoop QA commented on HBASE-7504: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564139/7504-94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3960//console This message is automatically generated. -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)
[ https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549437#comment-13549437 ] Anil Gupta commented on HBASE-7474: --- For SortingProtocol: + T Result[] sortIncreasing(Scan scan, byte[] columnFamily, byte[] columnQualifier, I think sortAscending would be more familiar to people who have worked with RDBMS. Anil: Done. Even, i was thinking that Ascending and Descending are more familiar + T Result[] sortDecreasing(Scan scan, byte[] columnFamily, byte[] columnQualifier, sortDescending would be a better method name. Anil: Done + * @param singleRegion does this scan request spans multiple regions? spelling: 'spans' - 'span' Anil: spans is the correct word Looking at SortingProtocolImplementation.sortIncreasing(), singleRegion is not referenced in the loop - we scan until there is no more row. Anil: When the scan is limited to a single region we only return startIndex to startIndex+(pageSize-1) results to client since we dont need to merge sort at client side. We dont need to use singleRegion in the loop. Otherwise if scan spans multiple region then Region returns 0 to (startIndex+(pageSize-1)) to client for carrying out merge sort. Some clarification is needed in javadoc and variable name. Anil:Do you want me to write above description for singleRegion in comments? Wont it confuse the user with too much of information? Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS) --- Key: HBASE-7474 URL: https://issues.apache.org/jira/browse/HBASE-7474 Project: HBase Issue Type: New Feature Components: Coprocessors, Scanners Affects Versions: 0.94.3 Reporter: Anil Gupta Assignee: Anil Gupta Priority: Minor Labels: coprocessors, scan, sort Fix For: 0.94.5 Attachments: hbase-7474.patch, hbase-7474-v2.patch, SortingEndpoint_high_level_flowchart.pdf Recently, i have developed an Endpoint which can sort the Results(rows) on the basis of column values. This functionality is similar to order by clause of RDBMS. I will be submitting this Patch for HBase0.94.3 I am almost done with the initial development and testing of feature. But, i need to write the JUnits for this. I will also try to make design doc. Thanks, Anil Gupta Software Engineer II, Intuit, inc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)
[ https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549452#comment-13549452 ] Anil Gupta commented on HBASE-7474: --- License headers in SortingClient.java and BigDecimalSortingColumnInterpreter.java are not properly formatted. Anil: Pending Some log statements, such as the following, can be at debug level. + log.info(Querying only one region for sorting); Anil: Done +if (sortDecreasing) return instance.sortDecreasing(scan, columnFamily, columnQualifier, + colInterpreter, startIndex, pageSize, true); +else return instance.sortIncreasing(scan, columnFamily, columnQualifier, 'else' keyword is not needed above. Anil: Done. But, personally i like to explicitly use the else keyword for readability purpose. I am curious to know if there is any technical reason for not using else in the above case? + * This method is used to do the merge sort the rows from multiple regions and produce the final output Remove 'do the'. Wrap long line. Anil: Done +for (Map.Entrybyte[], Result[] regionResultsEntryMap : regionResultMap.entrySet()) { regionResultsEntryMap - regionResultsEntry or regionResultsMapEntry Anil: Done +if(totalNoOfRows startIndex) +{ Normally left brace is on the same line as if statement. Insert a space between if and (. Anil: Done currentMaxorMinValueRegion and maxOrMin are used in the if / else blocks. You can move them inside if / else block and give them names that are clearer in meaning. Anil: Done +for (Result[] regionResult : regionResults) { + if ((regionResult.length - 1) arrayIndex[regionNum]) { regionResults and arrayIndex are both arrays. So you can use the same index to access them - in my opinion the code is more readable. Anil:Pending + finalResult[finalResultCurrentSize++] = regionResults[currentMaxorMinValueRegion][arrayIndex[currentMaxorMinValueRegion]]; Wrap long line above. Anil: Done + if (colInterpreter.compare(tmp, maxOrMin) 0) { If I read the code correctly, the above comparison is the major difference between ascending and descending sorting. A little abstraction would allow you to unify the two cases. Anil: IMHO, the only way to do that is to put an If(sortDecresing) condition and then either do or comparison on the basis of sortDecreasing. I am worried that this abstraction will make the implementation a tab more slow since the worst case complexity of this sorting is O(n*n). I would prefer performance over few extra lines of code. Let me know your views. Looking at SortingColumnInterpreter, this is the only method which is not present in ColumnInterpreter: + T getValue(KeyValue kv) throws IOException; The following method is already provided by ColumnInterpreter: public abstract T getValue(byte[] colFamily, byte[] colQualifier, KeyValue kv) throws IOException; Anil: I am thinking of adding the missing method T getValue(KeyValue kv) throws IOException; in ColumnInterpreter. Is that fine? I dont understand why we need colFamily and colQualifier in getValue method when only a KeyValue is passed. Please consider dropping SortingColumnInterpreter Thanks a lot for doing the code review. ~Anil Gupta Software Engineer II, Intuit, Inc Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS) --- Key: HBASE-7474 URL: https://issues.apache.org/jira/browse/HBASE-7474 Project: HBase Issue Type: New Feature Components: Coprocessors, Scanners Affects Versions: 0.94.3 Reporter: Anil Gupta Assignee: Anil Gupta Priority: Minor Labels: coprocessors, scan, sort Fix For: 0.94.5 Attachments: hbase-7474.patch, hbase-7474-v2.patch, SortingEndpoint_high_level_flowchart.pdf Recently, i have developed an Endpoint which can sort the Results(rows) on the basis of column values. This functionality is similar to order by clause of RDBMS. I will be submitting this Patch for HBase0.94.3 I am almost done with the initial development and testing of feature. But, i need to write the JUnits for this. I will also try to make design doc. Thanks, Anil Gupta Software Engineer II, Intuit, inc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549457#comment-13549457 ] Hudson commented on HBASE-7504: --- Integrated in HBase-TRUNK #3721 (See [https://builds.apache.org/job/HBase-TRUNK/3721/]) HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) (Revision 1431208) Result = FAILURE zjushch : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7403) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549464#comment-13549464 ] Hadoop QA commented on HBASE-7403: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564137/hbase-7403-trunkv9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplicationWithCompression org.apache.hadoop.hbase.client.TestMultiParallel Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3959//console This message is automatically generated. Online Merge Key: HBASE-7403 URL: https://issues.apache.org/jira/browse/HBASE-7403 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t
[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549494#comment-13549494 ] Hudson commented on HBASE-7504: --- Integrated in HBase-0.94 #721 (See [https://builds.apache.org/job/HBase-0.94/721/]) HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) (Revision 1431204) Result = SUCCESS zjushch : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS
[ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549560#comment-13549560 ] Hudson commented on HBASE-7504: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #340 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/340/]) HBASE-7504 -ROOT- may be offline forever after FullGC of RS (Chunhui) (Revision 1431208) Result = FAILURE zjushch : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java -ROOT- may be offline forever after FullGC of RS - Key: HBASE-7504 URL: https://issues.apache.org/jira/browse/HBASE-7504 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch 1.FullGC happen on ROOT regionserver. 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler 3.Regionserver complete the FullGC 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true 5.ServerShutdownHandler skip assigning ROOT region 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report 7.ROOT is offline now, and won't be assigned any more unless we restart master Master Log: {code} 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 {code} No log of assigning ROOT Regionserver log: {code} 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 10ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)
[ https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549585#comment-13549585 ] Ted Yu commented on HBASE-7474: --- @Anil: Thanks for the detailed response. In the future, you can quote comments using bqdot. People would be able to correlate your response with original comment. w.r.t. singleRegion in SortingProtocolImplementation.sortIncreasing(), your explanation makes sense. + * @param singleRegion does this scan request spans multiple regions? Here 'scan request' is singular, so 'spans' should be 'span'. I am fine with the javadoc after correcting spelling. +if (sortDecreasing) return instance.sortDecreasing(scan, columnFamily, columnQualifier, + colInterpreter, startIndex, pageSize, true); +else return instance.sortIncreasing(scan, columnFamily, columnQualifier, bq. I am curious to know if there is any technical reason for not using else in the above case? The reason is that when sortDecreasing is true, we would return from the method, hence not reaching else statement. bq. I am worried that this abstraction will make the implementation a tab more slow There are several conditional statements inside colInterpreter.compare(), I doubt there would be noticeable impact on performance if we unite code ascending and descending sorting. You can record the performance number for current implementation and compare the performance of rewritten code with that number. bq. I am thinking of adding the missing method T getValue(KeyValue kv) throws IOException; in ColumnInterpreter. Is that fine? ColumnInterpreter is able to provide access to value of the passed in KeyValue, so I don't think there is need to add the new method. Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS) --- Key: HBASE-7474 URL: https://issues.apache.org/jira/browse/HBASE-7474 Project: HBase Issue Type: New Feature Components: Coprocessors, Scanners Affects Versions: 0.94.3 Reporter: Anil Gupta Assignee: Anil Gupta Priority: Minor Labels: coprocessors, scan, sort Fix For: 0.94.5 Attachments: hbase-7474.patch, hbase-7474-v2.patch, SortingEndpoint_high_level_flowchart.pdf Recently, i have developed an Endpoint which can sort the Results(rows) on the basis of column values. This functionality is similar to order by clause of RDBMS. I will be submitting this Patch for HBase0.94.3 I am almost done with the initial development and testing of feature. But, i need to write the JUnits for this. I will also try to make design doc. Thanks, Anil Gupta Software Engineer II, Intuit, inc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
chunhui shen created HBASE-7529: --- Summary: Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7529: Attachment: 7529-trunk.patch Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549710#comment-13549710 ] ramkrishna.s.vasudevan commented on HBASE-7529: --- Good catch. +1. Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7529: -- Priority: Critical (was: Major) Hadoop Flags: Reviewed This was discovered when Chunhui tried to find root cause for TestMultiParallel failure. +1 from me. Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7529: -- Status: Patch Available (was: Open) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS)
[ https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549776#comment-13549776 ] Lars Hofhansl commented on HBASE-7474: -- [~giacomotaylor] do you have any comments? Would this be useful for you? Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to order by clause of RDBMS) --- Key: HBASE-7474 URL: https://issues.apache.org/jira/browse/HBASE-7474 Project: HBase Issue Type: New Feature Components: Coprocessors, Scanners Affects Versions: 0.94.3 Reporter: Anil Gupta Assignee: Anil Gupta Priority: Minor Labels: coprocessors, scan, sort Fix For: 0.94.5 Attachments: hbase-7474.patch, hbase-7474-v2.patch, SortingEndpoint_high_level_flowchart.pdf Recently, i have developed an Endpoint which can sort the Results(rows) on the basis of column values. This functionality is similar to order by clause of RDBMS. I will be submitting this Patch for HBase0.94.3 I am almost done with the initial development and testing of feature. But, i need to write the JUnits for this. I will also try to make design doc. Thanks, Anil Gupta Software Engineer II, Intuit, inc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549790#comment-13549790 ] Hadoop QA commented on HBASE-7529: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564183/7529-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestLocalHBaseCluster {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3961//console This message is automatically generated. Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently
[ https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549789#comment-13549789 ] ramkrishna.s.vasudevan commented on HBASE-7468: --- So found the reason. As stated in the above comment after rollback we need to delete the znode. Only after the znode deletion happens it is possible to remove from RIT. Only then the disable will be successful. In the previous commit, the infinite loops were removed and changed to finite loops. So basically here the {code} assertFalse(region is still in transition, am.getRegionsInTransition().containsKey(regions.get(0).getRegionInfo().getEncodedName())); {code} assertion has failed and it has tried to disable the table which did not happen. But in the output file attached by Lars the thing is the node deleted event never happened at all and i doubt it is because of the session expiry error that has come just after the rollback {code} 2013-01-06 21:49:35,500 WARN [Master:0;bunnypig,51009,1357537755267-EventThread] zookeeper.ZKUtil(423): hconnection-0x13c138da85b0019 Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:414) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:188) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:301) {code} So my suggestion would be we need to wait till the RIT is removed for the SPLITTING znode that happens thro AM.nodeDeleted(). And we should introdue a timeout for the test which is missing. The same testcase does not exist in Trunk. @Lars Pls provide your thoughts. TestSplitTransactionOnCluster hangs frequently -- Key: HBASE-7468 URL: https://issues.apache.org/jira/browse/HBASE-7468 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Lars Hofhansl Assignee: ramkrishna.s.vasudevan Attachments: 7468-jstack.txt, 7468-output.zip, TestSplitTransactionOnCluster-jstack.txt This what I saw once in a local build. {code} java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7521) fix HBASE-6060 (regions stuck in opening state) in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549809#comment-13549809 ] ramkrishna.s.vasudevan commented on HBASE-7521: --- @Sergey I had a glance at the patch. I think the SSH and the retry logic in assign on seeing an RS is down will lead to race conditions which is handled in the patches in HBASE-6060. [~rajesh32] Wanna take a look at this? I know you have a version of this running in your cluster for 0.94. fix HBASE-6060 (regions stuck in opening state) in 0.94 --- Key: HBASE-7521 URL: https://issues.apache.org/jira/browse/HBASE-7521 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7521-v0.patch, HBASE-7521-v1.patch Discussion in HBASE-6060 implies that the fix there does not work on 0.94. Still, we may want to fix the issue in 0.94 (via some different fix) because the regions stuck in opening for ridiculous amounts of time is not a good thing to have. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7529) Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk
[ https://issues.apache.org/jira/browse/HBASE-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549826#comment-13549826 ] stack commented on HBASE-7529: -- +1 Excellent find. Good stuff Chunhui. Wrong ExecutorType for EventType.M_RS_OPEN_ROOT in trunk Key: HBASE-7529 URL: https://issues.apache.org/jira/browse/HBASE-7529 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 7529-trunk.patch {code} M_RS_OPEN_ROOT(21, ExecutorType.RS_OPEN_REGION), // Master asking RS to open root {code} It's a mistake only in trunk, causing ROOT couldn't be online for a long long time: 1.ROOT wait open-region-thread to handle opening it. 2.Opening regions wait for ROOT to online, but occupy the threads... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently
[ https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549831#comment-13549831 ] stack commented on HBASE-7468: -- bq. ...we need to wait till the RIT is removed for the SPLITTING znode... Or can we just block until it is removed (with a timeout on the test) rather than have a timer? Will it get removed if we wait long enough? Why is it taking a while? Why is this test not in trunk, do you know Ram? Thanks for taking a looksee. TestSplitTransactionOnCluster hangs frequently -- Key: HBASE-7468 URL: https://issues.apache.org/jira/browse/HBASE-7468 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Lars Hofhansl Assignee: ramkrishna.s.vasudevan Attachments: 7468-jstack.txt, 7468-output.zip, TestSplitTransactionOnCluster-jstack.txt This what I saw once in a local build. {code} java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7468) TestSplitTransactionOnCluster hangs frequently
[ https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549835#comment-13549835 ] ramkrishna.s.vasudevan commented on HBASE-7468: --- The way RIT is populated for SPLITTING in 0.94 is not there in trunk after the AM related changes. The logs does not clearly tell the reason why is it taking a while but currently the testcase waits for 2 sec for it to happen. Waiting should remove the znode i feel. Need to run the tests repeatedly to see if there is any other reason for it. TestSplitTransactionOnCluster hangs frequently -- Key: HBASE-7468 URL: https://issues.apache.org/jira/browse/HBASE-7468 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Lars Hofhansl Assignee: ramkrishna.s.vasudevan Attachments: 7468-jstack.txt, 7468-output.zip, TestSplitTransactionOnCluster-jstack.txt This what I saw once in a local build. {code} java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4274) RS should periodically ping its HLog pipeline even if no writes are arriving
[ https://issues.apache.org/jira/browse/HBASE-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4274: - Priority: Major (was: Critical) Fix Version/s: (was: 0.96.0) Marking down to major and moving out of 0.96. Bring back in if folks want RS to die quickly when HDFS goes out from under HBase (It does seem like general tendency though is to go the other direction, and try and ride over an HDFS outage if possible). RS should periodically ping its HLog pipeline even if no writes are arriving Key: HBASE-4274 URL: https://issues.apache.org/jira/browse/HBASE-4274 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon If you restart HDFS underneath HBase, when HBase isn't taking any write load, the region servers won't notice that there's any problem until the next time they take a write, at which point they will abort (because the pipeline is gone from beneath them). It would be better if they wrote some garbage to their HLog once every few seconds as a sort of keepalive, so they will aggressively abort as soon as there's an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4147) StoreFile query usage report
[ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4147: - Priority: Major (was: Critical) Fix Version/s: (was: 0.96.0) This turned into a (useful) discussion. [~eclark] Can you take a look and note what in 0.96 metrics2 might help answering the questions Doug poses above? We also have a trace mechanism committed but again it would take some work to get it to the level Doug is asking for in the above. It would seem that this issue should become two issues now: one to improve the trace so can go down to the per-storefile level and another to add to metrics so can do at the storefile emissions (if possible). Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor. StoreFile query usage report Key: HBASE-4147 URL: https://issues.apache.org/jira/browse/HBASE-4147 Project: HBase Issue Type: Improvement Reporter: Doug Meil Attachments: hbase_4147_storefilereport_2011_08_10.pdf, hbase_4147_storefilereport.pdf Detailed information on what HBase is doing in terms of reads is hard to come by. What would be useful is to have a periodic StoreFile query report. Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files. This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile. Even this level of summary would be useful to detect a which tables CFs are being accessed the most, and including the StoreFile would provide insight into relative uncompaction (i.e., lots of StoreFiles). I think the log-output, as opposed to UI, is an important facet with this. I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data). Just getting this to log-file would be a big improvement. Will this have a non-zero performance impact? Yes. Hopefully small, but yes it will. However, flying a plane without any instrumentation isn't fun. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7492) add new online-snapshot properties to hbase-default.xml
[ https://issues.apache.org/jira/browse/HBASE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-7492: - Assignee: Jonathan Hsieh add new online-snapshot properties to hbase-default.xml --- Key: HBASE-7492 URL: https://issues.apache.org/jira/browse/HBASE-7492 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Suggested by jesse on the HBASE-6864 review. {code} 76 /** Maximum number of concurrent snapshot region tasks that can run concurrently */ 77 private static final String CONCURENT_SNAPSHOT_TASKS_KEY = hbase.snapshot.region.concurrentTasks; 78 private static final int DEFAULT_CONCURRENT_SNAPSHOT_TASKS = 3; 79 80 /** Conf key for number of request threads to start snapshots on regionservers */ 81 public static final String SNAPSHOT_REQUEST_THREADS_KEY = hbase.snapshot.region.pool.threads; 82 /** # of threads for snapshotting regions on the rs. */ 83 public static final int SNAPSHOT_REQUEST_THREADS_DEFAULT = 10; 84 85 /** Conf key for max time to keep threads in snapshot request pool waiting */ 86 public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = hbase.snapshot.region.timeout; 87 /** Keep threads alive in request pool for max of 60 seconds */ 88 public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6; 89 90 /** Conf key for millis between checks to see if snapshot completed or if there are errors*/ 91 public static final String SNAPSHOT_REQUEST_WAKE_MILLIS_KEY = hbase.snapshot.region.wakefrequency; 92 /** Default amount of time to check for errors while regions finish snapshotting */ 93 private static final long SNAPSHOT_REQUEST_WAKE_MILLIS_DEFAULT = 500; {code} nit: add these to hbase-default.xml? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-7492) add new online-snapshot properties to hbase-default.xml
[ https://issues.apache.org/jira/browse/HBASE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-7492 started by Jonathan Hsieh. add new online-snapshot properties to hbase-default.xml --- Key: HBASE-7492 URL: https://issues.apache.org/jira/browse/HBASE-7492 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Suggested by jesse on the HBASE-6864 review. {code} 76 /** Maximum number of concurrent snapshot region tasks that can run concurrently */ 77 private static final String CONCURENT_SNAPSHOT_TASKS_KEY = hbase.snapshot.region.concurrentTasks; 78 private static final int DEFAULT_CONCURRENT_SNAPSHOT_TASKS = 3; 79 80 /** Conf key for number of request threads to start snapshots on regionservers */ 81 public static final String SNAPSHOT_REQUEST_THREADS_KEY = hbase.snapshot.region.pool.threads; 82 /** # of threads for snapshotting regions on the rs. */ 83 public static final int SNAPSHOT_REQUEST_THREADS_DEFAULT = 10; 84 85 /** Conf key for max time to keep threads in snapshot request pool waiting */ 86 public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = hbase.snapshot.region.timeout; 87 /** Keep threads alive in request pool for max of 60 seconds */ 88 public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6; 89 90 /** Conf key for millis between checks to see if snapshot completed or if there are errors*/ 91 public static final String SNAPSHOT_REQUEST_WAKE_MILLIS_KEY = hbase.snapshot.region.wakefrequency; 92 /** Default amount of time to check for errors while regions finish snapshotting */ 93 private static final long SNAPSHOT_REQUEST_WAKE_MILLIS_DEFAULT = 500; {code} nit: add these to hbase-default.xml? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4147) StoreFile query usage report
[ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549869#comment-13549869 ] Andrew Purtell commented on HBASE-4147: --- bq. Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor. I might be looking at this again in the future in the context of HBASE-6572. Deciding what stores to migrate. StoreFile query usage report Key: HBASE-4147 URL: https://issues.apache.org/jira/browse/HBASE-4147 Project: HBase Issue Type: Improvement Reporter: Doug Meil Attachments: hbase_4147_storefilereport_2011_08_10.pdf, hbase_4147_storefilereport.pdf Detailed information on what HBase is doing in terms of reads is hard to come by. What would be useful is to have a periodic StoreFile query report. Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files. This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile. Even this level of summary would be useful to detect a which tables CFs are being accessed the most, and including the StoreFile would provide insight into relative uncompaction (i.e., lots of StoreFiles). I think the log-output, as opposed to UI, is an important facet with this. I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data). Just getting this to log-file would be a big improvement. Will this have a non-zero performance impact? Yes. Hopefully small, but yes it will. However, flying a plane without any instrumentation isn't fun. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs
Jean-Daniel Cryans created HBASE-7530: - Summary: [replication] Work around HDFS-4380 else we get NPEs Key: HBASE-7530 URL: https://issues.apache.org/jira/browse/HBASE-7530 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE: {noformat} 2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312) {noformat} Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master
[ https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3669: - Priority: Major (was: Critical) Fix Version/s: (was: 0.96.0) Knocking down priority. My sense is that in 0.96, after all the AM work, this issue less likely. Leaving open in case we do see it again. Moving out of 0.96 in meantime. Making major rather than critical. Region in PENDING_OPEN keeps being bounced between RS and master Key: HBASE-3669 URL: https://issues.apache.org/jira/browse/HBASE-3669 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Attachments: HBASE-3669-debug-v1.patch After going crazy killing region servers after HBASE-3668, most of the cluster recovered except for 3 regions that kept being refused by the region servers. One the master I would see: {code} 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. so generated a random one; hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) available servers 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. to sv2borg171,60020,1300399357135 {code} Then on the region server: {code} 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempting to transition node f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21 {code} I'm not sure I fully understand what was going on... the master was suppose to OFFLINE the znode but then that's not what the region server was seeing? In any case, I was able to recover by doing a force unassign for each region and then assign. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7384) Introducing waitForCondition function into test cases
[ https://issues.apache.org/jira/browse/HBASE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-7384: - Attachment: hbase-7384_2.4.patch Resubmit patch to incorporate Enis feedbacks. Thanks, -Jeffrey Introducing waitForCondition function into test cases - Key: HBASE-7384 URL: https://issues.apache.org/jira/browse/HBASE-7384 Project: HBase Issue Type: Test Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Labels: test Fix For: 0.96.0 Attachments: hbase-7384_1.0.patch, hbase-7384_2.4.patch, hbase-7384.patch, Waiter.java Recently I'm working on flaky test cases and found we have many places using while loop and sleep to wait for a condition to be true. There are several issues in existing ways: 1) Many similar code doing the same thing 2) When time out happens, different errors are reported without explicitly indicating a time out situation 3) When we want to increase the max timeout value to verify if a test case fails due to a not-enough time out value, we have to recompile redeploy code I propose to create a waitForCondition function as a test utility function like the following: {code} public interface WaitCheck { public boolean Check() ; } public boolean waitForCondition(int timeOutInMilliSeconds, int checkIntervalInMilliSeconds, WaitCheck s) throws InterruptedException { int multiplier = 1; String multiplierProp = System.getProperty(extremeWaitMultiplier); if(multiplierProp != null) { multiplier = Integer.parseInt(multiplierProp); if(multiplier 1) { LOG.warn(String.format(Invalid extremeWaitMultiplier property value:%s. is ignored., multiplierProp)); multiplier = 1; } } int timeElapsed = 0; while(timeElapsed timeOutInMilliSeconds * multiplier) { if(s.Check()) { return true; } Thread.sleep(checkIntervalInMilliSeconds); timeElapsed += checkIntervalInMilliSeconds; } assertTrue(WaitForCondition failed due to time out( + timeOutInMilliSeconds + milliseconds expired), false); return false; } {code} By doing the above way, there are several advantages: 1) Clearly report time out error when such situation happens 2) Use System property extremeWaitMultiplier to increase max time out dynamically for a quick verification 3) Standardize current wait situations Pleas let me know what your thoughts on this. Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs
[ https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549907#comment-13549907 ] Lars Hofhansl commented on HBASE-7530: -- Interesting. Did this only start recently (which would be strange)? This happens with larger blocksizes too, right? If so this should be critical. [replication] Work around HDFS-4380 else we get NPEs Key: HBASE-7530 URL: https://issues.apache.org/jira/browse/HBASE-7530 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE: {noformat} 2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312) {noformat} Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs
[ https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549913#comment-13549913 ] Jean-Daniel Cryans commented on HBASE-7530: --- [~lhofhansl] Not sure when it started happening, the code has changed on the HBase side but not on the Hadoop side so we should have seen this before. It should happen with larger block sizes too, just a few orders of magnitude less probable to happen than in does in TestReplication :) [replication] Work around HDFS-4380 else we get NPEs Key: HBASE-7530 URL: https://issues.apache.org/jira/browse/HBASE-7530 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE: {noformat} 2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312) {noformat} Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
Jean-Daniel Cryans created HBASE-7531: - Summary: [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
[ https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-7531: -- Attachment: HBASE-7531.patch Just a simple fix, setting the reader to null if we couldn't get it. [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader --- Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
[ https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-7531: - Assignee: Jean-Daniel Cryans [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader --- Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3643) Close the filesystem handle when HRS is aborting
[ https://issues.apache.org/jira/browse/HBASE-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549917#comment-13549917 ] stack commented on HBASE-3643: -- We should at least try this for 0.96... Can punt if too much work. Close the filesystem handle when HRS is aborting Key: HBASE-3643 URL: https://issues.apache.org/jira/browse/HBASE-3643 Project: HBase Issue Type: Improvement Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.96.0 I thought of a way to fix HBASE-3515 that has a very broad impact, so I'm creating this jira to *raise awareness* and gather comments. Currently when we call HRS.abort, it's still possible to do HDFS operations like rolling logs and flushing files. It also has the impact that some threads cannot write to ZK (like the situation described in HBASE-3515) but then can still write to HDFS. Since that call is so central, I think we should {color:red} add fs.close() inside the abort method{color}. The impact of this is that everything else that happens after the close call, like closing files or appending, will fail in the most horrible ways. On the bright side, this means less disruptive changes on HDFS. Todd pointed at HBASE-2231 as related, but I think my solution is still too sloppy as we could still finish a compaction and immediately close the filesystem after that (damage's done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7521) fix HBASE-6060 (regions stuck in opening state) in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549925#comment-13549925 ] Sergey Shelukhin commented on HBASE-7521: - Can you please elaborate on race conditions? Do you mean HBASE-5816? As far as I can see this patch preserves existing race conditions but doesn't add new ones :) Although, my experience with AM is limited, even more so in 94. We can try to rebase latest 094 patch from HBASE-6060 instead... fix HBASE-6060 (regions stuck in opening state) in 0.94 --- Key: HBASE-7521 URL: https://issues.apache.org/jira/browse/HBASE-7521 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7521-v0.patch, HBASE-7521-v1.patch Discussion in HBASE-6060 implies that the fix there does not work on 0.94. Still, we may want to fix the issue in 0.94 (via some different fix) because the regions stuck in opening for ridiculous amounts of time is not a good thing to have. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs
[ https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-7530: -- Attachment: HBASE-7530.patch The fix I proposed, I'm currently testing it in a loop. [replication] Work around HDFS-4380 else we get NPEs Key: HBASE-7530 URL: https://issues.apache.org/jira/browse/HBASE-7530 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7530.patch I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE: {noformat} 2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312) {noformat} Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549932#comment-13549932 ] Sergey Shelukhin commented on HBASE-7528: - Do you mean only the precheck is in error, or the null being there as such? For now fixing the precheck. NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7528: Attachment: HBASE-7528-v0.patch NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7528: Assignee: Sergey Shelukhin Status: Patch Available (was: Open) NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7522) Tests should not be writing under /tmp/
[ https://issues.apache.org/jira/browse/HBASE-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549937#comment-13549937 ] Andrew Purtell commented on HBASE-7522: --- TestLocalHBaseCluster is certainly picking up files under /tmp/hbase-${user}. Tests should not be writing under /tmp/ --- Key: HBASE-7522 URL: https://issues.apache.org/jira/browse/HBASE-7522 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.94.5 Reporter: Enis Soztutar As per the discussion http://mail-archives.apache.org/mod_mbox/hbase-dev/201301.mbox/%3CCA%2BRK%3D_BmV%3Dvwws4VeDJVPt6hY7NKCDEafex3XTNam630pQRBbA%40mail.gmail.com%3E, tests should not be writing under /tmp/ directory. TestStoreFile is one of the offending ones. Some of them will be fixed at HBASE-6824. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
[ https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549948#comment-13549948 ] Sergey Shelukhin commented on HBASE-7531: - +1. The cause is the dubious semantics of openReader imho (but I may just be unfamiliar with code); sleepMultiplier decision can be in the outside loop and openReader return value meaning can then be simpler. [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader --- Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549954#comment-13549954 ] Sergey Shelukhin commented on HBASE-6466: - I didn't see this on EC2 when I was doing perf testing, or just in exploratory test w/LTT. Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549955#comment-13549955 ] Sergey Shelukhin commented on HBASE-5416: - Is this JIRA unresolved pending 0.94 commit? Just checking as it shows up in my filter :) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: Filters, Performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: 5416-0.94-v1.txt, 5416-0.94-v2.txt, 5416-Filtered_scans_v6.patch, 5416-v13.patch, 5416-v14.patch, 5416-v15.patch, 5416-v16.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch, HBASE-5416-v10.patch, HBASE-5416-v11.patch, HBASE-5416-v12.patch, HBASE-5416-v12.patch, HBASE-5416-v7-rebased.patch, HBASE-5416-v8.patch, HBASE-5416-v9.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7268: Attachment: HBASE-7268-v6.patch feedback from /r/, removing lines longer than 100 correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549973#comment-13549973 ] Sergey Shelukhin commented on HBASE-7268: - The test repeatedly passes locally... correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7384) Introducing waitForCondition function into test cases
[ https://issues.apache.org/jira/browse/HBASE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549980#comment-13549980 ] Hadoop QA commented on HBASE-7384: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564216/hbase-7384_2.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.replication.TestReplicationWithCompression org.apache.hadoop.hbase.TestLocalHBaseCluster {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3962//console This message is automatically generated. Introducing waitForCondition function into test cases - Key: HBASE-7384 URL: https://issues.apache.org/jira/browse/HBASE-7384 Project: HBase Issue Type: Test Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Labels: test Fix For: 0.96.0 Attachments: hbase-7384_1.0.patch, hbase-7384_2.4.patch, hbase-7384.patch, Waiter.java Recently I'm working on flaky test cases and found we have many places using while loop and sleep to wait for a condition to be true. There are several issues in existing ways: 1) Many similar code doing the same thing 2) When time out happens, different errors are reported without explicitly indicating a time out situation 3) When we want to increase the max timeout value to verify if a test case fails due to a not-enough time out value, we have to recompile redeploy code I propose to create a waitForCondition function as a test utility function like the following: {code} public interface WaitCheck { public boolean Check() ; } public boolean waitForCondition(int timeOutInMilliSeconds, int checkIntervalInMilliSeconds, WaitCheck s) throws InterruptedException { int multiplier = 1; String multiplierProp = System.getProperty(extremeWaitMultiplier); if(multiplierProp != null) { multiplier = Integer.parseInt(multiplierProp); if(multiplier 1) { LOG.warn(String.format(Invalid extremeWaitMultiplier property value:%s. is ignored., multiplierProp)); multiplier = 1; } } int timeElapsed = 0; while(timeElapsed timeOutInMilliSeconds * multiplier) { if(s.Check()) { return true; }
[jira] [Updated] (HBASE-7213) Have HLog files for .META. edits only
[ https://issues.apache.org/jira/browse/HBASE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-7213: --- Attachment: 7213-2.10.patch Rebased (again). Also I fixed a bug in HMaster.java. Some of the unit test failures were legit and were caused by the bug. Have HLog files for .META. edits only - Key: HBASE-7213 URL: https://issues.apache.org/jira/browse/HBASE-7213 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 7213-2.10.patch, 7213-2.4.patch, 7213-2.6.patch, 7213-2.8.patch, 7213-2.9.patch, 7213-in-progress.2.2.patch, 7213-in-progress.2.patch, 7213-in-progress.patch Over on HBASE-6774, there is a discussion on separating out the edits for .META. regions from the other regions' edits w.r.t where the edits are written. This jira is to track an implementation of that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550350#comment-13550350 ] Sergey Shelukhin commented on HBASE-7383: - bq. generateColumnsForCf() returns byte[][], but Setbyte[] is passed as param above. Can you explain why the difference ? Convenience of existing users/implementation. I know, not a very good reason... Do you want me to change it? Should be easy to change to either if needed. bq. Please use SecureRandom instead. Why? create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550351#comment-13550351 ] Hadoop QA commented on HBASE-7528: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564228/HBASE-7528-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3963//console This message is automatically generated. NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7383: Attachment: HBASE-7383-v2.patch create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch, HBASE-7383-v2.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550362#comment-13550362 ] Elliott Clark commented on HBASE-6466: -- I'll circle back around and give this patch another run on a cluster next week. I'll try and get more details for you. Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7424) Enable the DeltaEncoding for the HFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju updated HBASE-7424: -- Description: HFileOutputFormat has a writer embedded but it is not configured to use the DeltaEncoding. This revision is to add that support to the HFileOutputFormat while it is used as an OutputFormat either the Mapper or the Reducer for a MapReduce task. (was: HFileOutputFormat has a writer embedded but it is not configured to use the DeltaEncoding and FavoredNodes. This revision is to add that support to the HFileOutputFormat while it is used as an OutputFormat either the Mapper or the Reducer for a MapReduce task.) Summary: Enable the DeltaEncoding for the HFileOutputFormat (was: Enable the DeltaEncoding and FavoredNodes for the HFileOutputFormat) Enable the DeltaEncoding for the HFileOutputFormat -- Key: HBASE-7424 URL: https://issues.apache.org/jira/browse/HBASE-7424 Project: HBase Issue Type: New Feature Reporter: Manukranth Kolloju Priority: Minor Labels: HFileOutputFormat HFileOutputFormat has a writer embedded but it is not configured to use the DeltaEncoding. This revision is to add that support to the HFileOutputFormat while it is used as an OutputFormat either the Mapper or the Reducer for a MapReduce task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7532) Enable the FavoredNodes for the HFileOutputFormat
Manukranth Kolloju created HBASE-7532: - Summary: Enable the FavoredNodes for the HFileOutputFormat Key: HBASE-7532 URL: https://issues.apache.org/jira/browse/HBASE-7532 Project: HBase Issue Type: New Feature Reporter: Manukranth Kolloju Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7533) Write an RPC Specification for 0.96
stack created HBASE-7533: Summary: Write an RPC Specification for 0.96 Key: HBASE-7533 URL: https://issues.apache.org/jira/browse/HBASE-7533 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.96.0 RPC format is changing for 0.96 to accomodate our protobufing all around. Here is a first cut. Please shred: https://docs.google.com/document/d/1-1RJMLXzYldmHgKP7M7ynK6euRpucD03fZ603DlZfGI/edit -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550390#comment-13550390 ] Ted Yu commented on HBASE-7383: --- w.r.t. SecureRandom, take a look at : http://www.coderanch.com/t/410832/java/java/Java-Random-SecureRandom create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch, HBASE-7383-v2.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550402#comment-13550402 ] Jonathan Hsieh commented on HBASE-7528: --- Thanks sergey. I'll commit this with one minor fix (there is a missing ' char in my description and in the patch). It still dones' tfix the problem but it does make the error message much better. NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-7528) NPE in hbck -repair when adopting orphans if not tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550402#comment-13550402 ] Jonathan Hsieh edited comment on HBASE-7528 at 1/10/13 8:56 PM: Thanks sergey. I'll commit this with one minor fix (there is a missing ' char in my description and in the patch). There still is a problem here but it does make the error message much better. was (Author: jmhsieh): Thanks sergey. I'll commit this with one minor fix (there is a missing ' char in my description and in the patch). It still dones' tfix the problem but it does make the error message much better. NPE in hbck -repair when adopting orphans if not tableinfo is found. Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550405#comment-13550405 ] Sergey Shelukhin commented on HBASE-7383: - Well, it says For general statistics, Random is fine. Its a typical modulo congruent function. SecureRandom is more random. Specifically, it aims to make it impossible to predict the next random number from a sequence, which is trivial to do with most modulo congruent algorithms., so for test data generation Random would seemingly be the right choice. create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch, HBASE-7383-v2.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7528: -- Summary: Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. (was: NPE in hbck -repair when adopting orphans if not tableinfo is found.) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. - Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7528: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. - Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7528: -- Component/s: hbck Affects Version/s: 0.96.0 0.90.6 0.92.2 0.94.3 Fix Version/s: 0.96.0 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. - Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0 Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0 Attachments: HBASE-7528-v0.patch {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550410#comment-13550410 ] Hadoop QA commented on HBASE-7268: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564234/HBASE-7268-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.TestLocalHBaseCluster org.apache.hadoop.hbase.client.TestMultiParallel {color:red}-1 core zombie tests{color}. There are 9 zombie test(s): at org.apache.hadoop.hbase.catalog.TestCatalogTracker.testServerNotRunningIOException(TestCatalogTracker.java:250) at org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS(TestMasterFailover.java:833) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3964//console This message is automatically generated. correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550413#comment-13550413 ] Hadoop QA commented on HBASE-7383: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564243/HBASE-7383-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 31 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3966//console This message is automatically generated. create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch, HBASE-7383-v2.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous
Jean-Daniel Cryans created HBASE-7534: - Summary: [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous Key: HBASE-7534 URL: https://issues.apache.org/jira/browse/HBASE-7534 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an already existing table and hot replaces the regions in it. I've seen TestReplication failing a few times because the old first region is still assigned and tried to flush but crashed due to the fact that the region's folder is missing in HDFS: {noformat} 2013-01-04 10:04:45,500 DEBUG [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(844): Renaming flushed file at hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,500 WARN [IPC Server handler 8 on 57099] namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d because destination's parent does not exist 2013-01-04 10:04:45,503 WARN [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(847): Unable to rename hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,504 WARN [DataStreamer for file /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769] hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769 File does not exist. [Lease. Holder: DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1] {noformat} Eventually the test times out because both region servers on the master cluster are dead. It can be easily fixed by pre-creating the table with enough regions. FWIW a bunch of other tests are using this facility, my IDE tells me that the 3 methods are called 25 times outside of {{HBaseTestingUtility}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7528: -- Attachment: hbase-7528.v1 v1 is what I committed. Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. - Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0 Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0 Attachments: HBASE-7528-v0.patch, hbase-7528.v1 {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7213) Have HLog files for .META. edits only
[ https://issues.apache.org/jira/browse/HBASE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550423#comment-13550423 ] Hadoop QA commented on HBASE-7213: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564237/7213-2.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.TestLocalHBaseCluster {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitBeforeSettingSplittingInZKInternals(TestSplitTransactionOnCluster.java:738) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitBeforeSettingSplittingInZK(TestSplitTransactionOnCluster.java:541) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:220) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3965//console This message is automatically generated. Have HLog files for .META. edits only - Key: HBASE-7213 URL: https://issues.apache.org/jira/browse/HBASE-7213 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 7213-2.10.patch, 7213-2.4.patch, 7213-2.6.patch, 7213-2.8.patch, 7213-2.9.patch, 7213-in-progress.2.2.patch, 7213-in-progress.2.patch, 7213-in-progress.patch Over on HBASE-6774, there is a discussion on separating out the edits for .META. regions from the other regions' edits w.r.t where the edits are written. This jira is to track an implementation of that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550431#comment-13550431 ] Ted Yu commented on HBASE-5416: --- For 0.94 patch, I saw the following on my Mac: {code} testScanner_JoinedScannersWithLimits(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.001 sec FAILURE! junit.framework.AssertionFailedError: expected:3 but was:1 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.hbase.regionserver.TestHRegion.testScanner_JoinedScannersWithLimits(TestHRegion.java:2976) {code} Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: Filters, Performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: 5416-0.94-v1.txt, 5416-0.94-v2.txt, 5416-Filtered_scans_v6.patch, 5416-v13.patch, 5416-v14.patch, 5416-v15.patch, 5416-v16.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch, HBASE-5416-v10.patch, HBASE-5416-v11.patch, HBASE-5416-v12.patch, HBASE-5416-v12.patch, HBASE-5416-v7-rebased.patch, HBASE-5416-v8.patch, HBASE-5416-v9.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550447#comment-13550447 ] Nick Dimiduk commented on HBASE-6201: - FYI, the guys at Wibidata have provided a [maven plugin|https://github.com/kijiproject/hbase-maven-plugin] that looks potentially interesting for the purpose of running these integration tests locally. It may need to be jury-rigged to launch a cluster out of the local sandbox rather than one provided by an external release... HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7535) Fix restore reference files
Matteo Bertozzi created HBASE-7535: -- Summary: Fix restore reference files Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7535: --- Attachment: HBASE-7535-v0.patch Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7535: --- Status: Patch Available (was: Open) Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7383) create integration test for HBASE-5416 (improving scan performance for certain filters)
[ https://issues.apache.org/jira/browse/HBASE-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550467#comment-13550467 ] Ted Yu commented on HBASE-7383: --- My interpretation of the article about SecureRandom is that it gives us better randomness. BTW there is a javadoc warning: [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java:83: warning - Tag @link: can't find verify(byte[], byte[]...) in org.apache.hadoop.hbase.util.test.LoadTestKVGenerator [INFO] create integration test for HBASE-5416 (improving scan performance for certain filters) --- Key: HBASE-7383 URL: https://issues.apache.org/jira/browse/HBASE-7383 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7383-v0.patch, HBASE-7383-v1.patch, HBASE-7383-v1.patch, HBASE-7383-v2.patch HBASE-5416 is risky and needs an integration test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550471#comment-13550471 ] Hadoop QA commented on HBASE-7535: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564271/HBASE-7535-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3967//console This message is automatically generated. Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous
[ https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-7534: -- Attachment: HBASE-7534.patch This patch adds a new set of keys (almost the same but the semantic is different, and I also didn't want to mess with Arrays) which is now used when creating the table. [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous Key: HBASE-7534 URL: https://issues.apache.org/jira/browse/HBASE-7534 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7534.patch {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an already existing table and hot replaces the regions in it. I've seen TestReplication failing a few times because the old first region is still assigned and tried to flush but crashed due to the fact that the region's folder is missing in HDFS: {noformat} 2013-01-04 10:04:45,500 DEBUG [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(844): Renaming flushed file at hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,500 WARN [IPC Server handler 8 on 57099] namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d because destination's parent does not exist 2013-01-04 10:04:45,503 WARN [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(847): Unable to rename hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,504 WARN [DataStreamer for file /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769] hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769 File does not exist. [Lease. Holder: DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1] {noformat} Eventually the test times out because both region servers on the master cluster are dead. It can be easily fixed by pre-creating the table with enough regions. FWIW a bunch of other tests are using this facility, my IDE tells me that the 3 methods are called 25 times outside of {{HBaseTestingUtility}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7458) TestReplicationWithCompression fails intermittently in both PreCommit and trunk builds
[ https://issues.apache.org/jira/browse/HBASE-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-7458: Labels: tes (was: ) TestReplicationWithCompression fails intermittently in both PreCommit and trunk builds -- Key: HBASE-7458 URL: https://issues.apache.org/jira/browse/HBASE-7458 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Critical Labels: tes Fix For: 0.96.0 TestReplicationWithCompression has been failing often. Here are few examples: https://builds.apache.org/job/PreCommit-HBASE-Build/3755/testReport/ https://builds.apache.org/job/HBase-TRUNK/3672/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testDeleteTypes/ https://builds.apache.org/job/HBase-0.94/677/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous
[ https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550500#comment-13550500 ] Jean-Daniel Cryans commented on HBASE-7534: --- bq. Go commit. See if it fixes the fails. FWIW the current failures are unrelated to this, from what I can tell the machine where Jenkins runs is either slow or something is slowing us down. For example in build 717's log for TestReplication.queueFailover: {noformat} 2013-01-09 06:14:47,771 DEBUG [RegionServer:1;vesta.apache.org,41495,1357711464011-EventThread.replicationSource,2] regionserver.ReplicationSource(638): Replicating 3 2013-01-09 06:14:49,730 INFO [Thread-1887] replication.TestReplication(779): Only got 9720 rows instead of 17576 current i=-16 2013-01-09 06:14:55,176 INFO [Thread-1887] replication.TestReplication(779): Only got 9720 rows instead of 17576 current i=-15 2013-01-09 06:14:56,789 DEBUG [RegionServer:1;vesta.apache.org,41495,1357711464011-EventThread.replicationSource,2] regionserver.ReplicationSource(651): Replicated in total: 1837 {noformat} You can see that it took 9 seconds to replicate a bunch of rows and no progress is made. It runs way faster than that on my machine. [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous Key: HBASE-7534 URL: https://issues.apache.org/jira/browse/HBASE-7534 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7534.patch {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an already existing table and hot replaces the regions in it. I've seen TestReplication failing a few times because the old first region is still assigned and tried to flush but crashed due to the fact that the region's folder is missing in HDFS: {noformat} 2013-01-04 10:04:45,500 DEBUG [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(844): Renaming flushed file at hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,500 WARN [IPC Server handler 8 on 57099] namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d because destination's parent does not exist 2013-01-04 10:04:45,503 WARN [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(847): Unable to rename hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,504 WARN [DataStreamer for file /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769] hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769 File does not exist. [Lease. Holder: DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1] {noformat} Eventually the test times out because both region servers on the master cluster are dead. It can be easily fixed by pre-creating the table with enough regions. FWIW a bunch of other tests are using this facility, my IDE tells me that the 3 methods are called 25 times outside of {{HBaseTestingUtility}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7399) Health check chore for HMaster
[ https://issues.apache.org/jira/browse/HBASE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550505#comment-13550505 ] Nick Dimiduk commented on HBASE-7399: - {code} + private boolean isHealthCheckerConfigured() { +String healthScriptLocation = this.conf.get(HConstants.HEALTH_SCRIPT_LOC); +return org.apache.commons.lang.StringUtils.isNotBlank(healthScriptLocation); + } {code} Nit: {{isNotBlank}} could/should be a static import. Health check chore for HMaster -- Key: HBASE-7399 URL: https://issues.apache.org/jira/browse/HBASE-7399 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7399-0.94.patch, HBASE-7399-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7534) [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous
[ https://issues.apache.org/jira/browse/HBASE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550512#comment-13550512 ] Jean-Daniel Cryans commented on HBASE-7534: --- Actually I was able to find one case where the test timed out on Jenkins: https://builds.apache.org/job/HBase-0.94/649/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/ Look for failed to rename. [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous Key: HBASE-7534 URL: https://issues.apache.org/jira/browse/HBASE-7534 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7534.patch {{HBaseTestingUtility.createMultiRegions}} is an abomination, it uses an already existing table and hot replaces the regions in it. I've seen TestReplication failing a few times because the old first region is still assigned and tried to flush but crashed due to the fact that the region's folder is missing in HDFS: {noformat} 2013-01-04 10:04:45,500 DEBUG [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(844): Renaming flushed file at hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,500 WARN [IPC Server handler 8 on 57099] namenode.FSDirectory(422): DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to /user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d because destination's parent does not exist 2013-01-04 10:04:45,503 WARN [RegionServer:1;172.21.3.117,57114,1357322589018.cacheFlusher] regionserver.Store(847): Unable to rename hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/.tmp/b938b33268064312abfc250d2eeca61d to hdfs://localhost:57099/user/jdcryans/hbase/test/62c85f8a6e3d0e32b2fb21326537f5a6/f/b938b33268064312abfc250d2eeca61d 2013-01-04 10:04:45,504 WARN [DataStreamer for file /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769] hdfs.DFSClient$DFSOutputStream$DataStreamer(2873): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/jdcryans/hbase/.logs/172.21.3.117,57113,1357322588994/172.21.3.117%2C57113%2C1357322588994.1357322683769 File does not exist. [Lease. Holder: DFSClient_hb_rs_172.21.3.117,57113,1357322588994, pendingcreates: 1] {noformat} Eventually the test times out because both region servers on the master cluster are dead. It can be easily fixed by pre-creating the table with enough regions. FWIW a bunch of other tests are using this facility, my IDE tells me that the 3 methods are called 25 times outside of {{HBaseTestingUtility}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
[ https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550515#comment-13550515 ] Jean-Daniel Cryans commented on HBASE-7531: --- I was able to find one test failure caused by this: https://builds.apache.org/job/HBase-0.94/656/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testVerifyRepJob/ The replication thread dies so truncating can't complete. bq. The cause is the dubious semantics of openReader imho Yeah I should probably fold in that reader somehow into ReplicationHLogReaderManager. [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader --- Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7528) Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found.
[ https://issues.apache.org/jira/browse/HBASE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550549#comment-13550549 ] Hudson commented on HBASE-7528: --- Integrated in HBase-TRUNK #3725 (See [https://builds.apache.org/job/HBase-TRUNK/3725/]) HBASE-7528 Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found (Sergey Shelukhin) (Revision 1431637) Result = FAILURE jmhsieh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java Unhelpful NPE in hbck -repair when adopting orphans if no tableinfo is found. - Key: HBASE-7528 URL: https://issues.apache.org/jira/browse/HBASE-7528 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.90.6, 0.92.2, 0.94.3, 0.96.0 Reporter: Jonathan Hsieh Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0 Attachments: HBASE-7528-v0.patch, hbase-7528.v1 {code} 13/01/09 17:34:54 DEBUG util.HBaseFsck: Attempting to adopt orphan: { meta = null, hdfs = hdfs://c1514.hal.cloudera.com:56020/hbase-cdh4.2/pe-2-table/28fbe62eee2ffd8ea2611335ed78f8ce, deployed = } Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1871) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:482) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:455) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:576) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:431) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3614) at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3436) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3430) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550564#comment-13550564 ] Ted Yu commented on HBASE-7535: --- Some logs were removed in patch: {code} -LOG.info(getReferredToFile(): p= + p + g1= + m.group(1) + g2= + m.group(2)); + {code} Would they be useful in debugging ? Maybe change to debug level. {code} + LOG.info(restore file as link-link= + hfileName + in= + familyDir); {code} 'link-link' means HFileLink created from HFileLink. Maybe call it 'link-from-link' or something similar ? In the test: {code} +HTableDescriptor htd = createTableDescriptor(table); {code} Please create a constant for table name so that it can be referred later: {code} +Path basePath = new Path(new Path(table, region), cf); {code} Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7453) HBASE-7423 snapshot followup
[ https://issues.apache.org/jira/browse/HBASE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550571#comment-13550571 ] Ted Yu commented on HBASE-7453: --- +1 from me. HBASE-7423 snapshot followup Key: HBASE-7453 URL: https://issues.apache.org/jira/browse/HBASE-7453 Project: HBase Issue Type: Sub-task Components: Client, master, regionserver, snapshots, Zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: hbase-6055, hbase-7290 Attachments: HBASE-7453-v0.patch, HBASE-7453-v1.patch HBASE-7423 change the arguments for one method used by restore code -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7536) Add test that confirms that multiple concurrent snapshot requests are rejected.
[ https://issues.apache.org/jira/browse/HBASE-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-7536: - Assignee: Jonathan Hsieh Add test that confirms that multiple concurrent snapshot requests are rejected. --- Key: HBASE-7536 URL: https://issues.apache.org/jira/browse/HBASE-7536 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Currently the rule is that we can only have online snapshot running at a time. This test tries to prove this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7535: --- Attachment: HBASE-7535-v1.patch readded the getReferredToFile() log as debug, and remove the log.info() in the restoreStoreFile() since we have already the log.trace() one call before about the file that we are going to restore. the table in the table descriptor and the table in the new Path(table, region, cf) are not related... the second one is a fake path to make happy StoreFile.getReferredToFile() and we don't care about the real path in the test. Changed the names to be more explicit about that Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch, HBASE-7535-v1.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7365) Safer table creation and deletion using .tmp dir
[ https://issues.apache.org/jira/browse/HBASE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7365: --- Attachment: HBASE-7365-v2.patch Safer table creation and deletion using .tmp dir Key: HBASE-7365 URL: https://issues.apache.org/jira/browse/HBASE-7365 Project: HBase Issue Type: Improvement Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.96.0 Attachments: HBASE-7365-v0.patch, HBASE-7365-v1.patch, HBASE-7365-v2.patch Currently tables are created in the root directory, and the removal works on the root directory. Change the code to use a /hbase/.tmp directory to make the creation and removal a bit safer Table Creation steps * Create the table descriptor (table folder, in /hbase/.tmp/) * Create the table regions (always in temp) * Move the table from temp to the root folder * Add the regions to meta * Trigger assignment * Set enable flag in ZooKeeper Table Deletion steps * Wait for regions in transition * Remove regions from meta (use bulk delete) * Move the table in /hbase/.tmp * Remove the table from the descriptor cache * Remove table from zookeeper * Archive the table The main changes in the current code are: * Writing to /hbase/.tmp and then rename * using bulk delete in DeletionTableHandler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7471) Enable Cleaners required for Snapshots by default
[ https://issues.apache.org/jira/browse/HBASE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-7471: - Assignee: Ted Yu Enable Cleaners required for Snapshots by default - Key: HBASE-7471 URL: https://issues.apache.org/jira/browse/HBASE-7471 Project: HBase Issue Type: Sub-task Components: Client, master, regionserver, snapshots, Zookeeper Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: hbase-6055, 0.96.0 Attachments: 7471.txt Currently, snapshots require admins to add configuration to their hbase-site.xml to have snapshot functionality available. It is at the moment off by default. {code} property namehbase.snapshot.enabled/name valuetrue/value /property {code} Maybe we should just enable snapshots by default. Discuss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7471) Enable Cleaners required for Snapshots by default
[ https://issues.apache.org/jira/browse/HBASE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7471: -- Attachment: 7471.txt Enable Cleaners required for Snapshots by default - Key: HBASE-7471 URL: https://issues.apache.org/jira/browse/HBASE-7471 Project: HBase Issue Type: Sub-task Components: Client, master, regionserver, snapshots, Zookeeper Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: hbase-6055, 0.96.0 Attachments: 7471.txt Currently, snapshots require admins to add configuration to their hbase-site.xml to have snapshot functionality available. It is at the moment off by default. {code} property namehbase.snapshot.enabled/name valuetrue/value /property {code} Maybe we should just enable snapshots by default. Discuss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7530) [replication] Work around HDFS-4380 else we get NPEs
[ https://issues.apache.org/jira/browse/HBASE-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550583#comment-13550583 ] stack commented on HBASE-7530: -- I don't get what this change does. Previous we had an explicit sizing. This does explicit sizing too, right, by rolling at some multiple of current size? [replication] Work around HDFS-4380 else we get NPEs Key: HBASE-7530 URL: https://issues.apache.org/jira/browse/HBASE-7530 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7530.patch I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE: {noformat} 2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312) {noformat} Talking to [~tlipcon], he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with *hbase.regionserver.logroll.multiplier* with a value of 0.0003f. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7531) [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader
[ https://issues.apache.org/jira/browse/HBASE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550584#comment-13550584 ] stack commented on HBASE-7531: -- +1 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader --- Key: HBASE-7531 URL: https://issues.apache.org/jira/browse/HBASE-7531 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Attachments: HBASE-7531.patch Here's a NPE I get half the time I run TestReplication: {noformat} 2012-12-20 08:59:17,259 ERROR [RegionServer:1;192.168.10.135,49168,1356011734418-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:65533/user/jdcryans/hbase/.logs/192.168.10.135,49168,1356011734418/192.168.10.135%2C49168%2C1356011734418.1356011956626 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.seek(SequenceFileLogReader.java:261) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:103) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:414) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:332) {noformat} The issue happens after an IOE was caught while opening the reader, the issue is that it isn't set to null after that then the rest of the code assumes the reader is usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7537) .regioninfo not created by createHRegion()
Matteo Bertozzi created HBASE-7537: -- Summary: .regioninfo not created by createHRegion() Key: HBASE-7537 URL: https://issues.apache.org/jira/browse/HBASE-7537 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi After HBASE-5683 we have no longer the .regioninfo written on disk during the table creation. so, if we fail before adding entries to .META. we end up with regions on disk that has no information, and hbck is not able to recover this situation. The .regioninfo is written in checkRegioninfoOnFilesystem() that was called by initialize(), during the table creation and region opening. With HBASE-5683 we skip the call to initialize(), in during the region creation, to avoid to initialize the memstore co. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7480) Explicit message for not allowed snapshot on meta tables
[ https://issues.apache.org/jira/browse/HBASE-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550587#comment-13550587 ] Ted Yu commented on HBASE-7480: --- +1 from me. Explicit message for not allowed snapshot on meta tables Key: HBASE-7480 URL: https://issues.apache.org/jira/browse/HBASE-7480 Project: HBase Issue Type: Sub-task Components: Client, master, regionserver, snapshots, Zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: hbase-6055, 0.96.0 Attachments: HBASE-7480-v0.patch taking a snapshot of -ROOT- or .META. now results in something like this: {code} Illegal first character 46 at 0. User-space table names can only start with 'word characters': i.e. [a-zA-Z_0-9] {code} changing the message in something more human readable to inform that meta table are not supported -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7535) Fix restore reference files
[ https://issues.apache.org/jira/browse/HBASE-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550588#comment-13550588 ] Ted Yu commented on HBASE-7535: --- +1, if tests pass. Fix restore reference files --- Key: HBASE-7535 URL: https://issues.apache.org/jira/browse/HBASE-7535 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Attachments: HBASE-7535-v0.patch, HBASE-7535-v1.patch After HBASE-7419 the HFileLink regex became stricter, to have the proper isHFileLink() check. but HFileLink should open both reference and hfiles since the main idea behind it is open stuff in /table/region/family/XYZ This patch fix the reference (split files) restore problem and open the hfilelink regex for HFileLink(/table/region/family/xyz).open() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira