[jira] [Resolved] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.
[ https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5482. --- Resolution: Fixed Hadoop Flags: Reviewed In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS. - Key: HBASE-5482 URL: https://issues.apache.org/jira/browse/HBASE-5482 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: 5482-v2.txt, HBASE-5482_1.patch, HBASE-5482_2.patch There are possibility of 2 problems - When we populate regionsToMove while iterating the serverinfo in descending manner there is a chance that the same region can be added twice. Because in the first loop we do a randomization of the regions. Where as when we get we have neededRegions!= 0 we just get the region in the index and add it again . This may lead to have same region in the regionsToMove list. - Another problem is when the problem in the first point happens then there is a chance that the regionToMove can have the same src and destination and the same region can be picked every 5 mins. {code} for(Map.EntryHServerInfo, ListHRegionInfo server : serversByLoad.descendingMap().entrySet()) { BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); if (idx = server.getValue().size()) break; HRegionInfo region = server.getValue().get(idx); if (region.isMetaRegion()) continue; // Don't move meta regions. regionsToMove.add(new RegionPlan(region, server.getKey(), null)); if(--neededRegions == 0) { // No more regions needed, done shedding break; } } {code} If i have meta and root in the top two loaded region server(totally 3 RS), we just skip the regions in those region server and populate the region from the least loaded RS. Then in the next loop we iterate from the least loaded server and populate the destination as also the same server. This is leading to a condition where every 5 min balancing happens and also the server is same for src and dest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler
[ https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5490. --- Resolution: Fixed Fix Version/s: (was: 0.90.7) 0.90.6 Assignee: ramkrishna.s.vasudevan This is already committed to 0.90.6. Changing it to 0.90.6 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler Key: HBASE-5490 URL: https://issues.apache.org/jira/browse/HBASE-5490 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: 5490-v2.txt, HBASE-5490.patch The new state that was added RS_ZK_REGION_FAILED_OPEN was failing the rolling restart. So move the new enum to the end of the list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.
[ https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5321. --- Resolution: Fixed Committed to 0.90. this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90. Key: HBASE-5321 URL: https://issues.apache.org/jira/browse/HBASE-5321 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5321.patch In HBASE-5160 we do not wait for TM to assign the regions after the first RS comes online. After doing this the variable this.allRegionServersOffline needs to be reset which is not done in 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4893) HConnectionImplementation is closed but not deleted
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-4893. --- Resolution: Fixed Assignee: Mubarak Seyed Resolving the issue HConnectionImplementation is closed but not deleted --- Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Labels: noob Fix For: 0.90.6 Attachments: HBASE-4893.v1.patch, HBASE-4893.v2.patch In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.
[ https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5235. --- Resolution: Fixed Committed to 0.92, trunk and 0.90 HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions. -- Key: HBASE-5235 URL: https://issues.apache.org/jira/browse/HBASE-5235 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6, 0.92.1 Attachments: HBASE-5235_0.90.patch, HBASE-5235_0.90_1.patch, HBASE-5235_0.90_2.patch, HBASE-5235_trunk.patch Pls find the analysis. Correct me if am wrong {code} 2012-01-15 05:14:02,374 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got while writing log entry to log java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026) {code} Here we have an exception in one of the writer threads. If any exception we try to hold it in an Atomic variable {code} private void writerThreadError(Throwable t) { thrown.compareAndSet(null, t); } {code} In the finally block of splitLog we try to close the streams. {code} for (WriterThread t: writerThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } checkForErrors(); } LOG.info(Split writers finished); return closeStreams(); {code} Inside checkForErrors {code} private void checkForErrors() throws IOException { Throwable thrown = this.thrown.get(); if (thrown == null) return; if (thrown instanceof IOException) { throw (IOException)thrown; } else { throw new RuntimeException(thrown); } } So once we throw the exception the DFSStreamer threads are not getting closed. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5269) IllegalMonitorStateException while retryin HLog split in 0.90 branch.
[ https://issues.apache.org/jira/browse/HBASE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5269. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.90. Thanks for the review Stack and Ted. IllegalMonitorStateException while retryin HLog split in 0.90 branch. - Key: HBASE-5269 URL: https://issues.apache.org/jira/browse/HBASE-5269 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5269.patch As part of HBASE-5137 fix this bug is introduced. The splitLogLock is released in the finally block inside the do-while loop. So when the loop executes second time the unlock of the splitLogLock throws Illegal Monitor Exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5225) Backport HBASE-3845 -data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5225. --- Resolution: Fixed Committed to 0.90. Backport HBASE-3845 -data loss because lastSeqWritten can miss memstore edits - Key: HBASE-5225 URL: https://issues.apache.org/jira/browse/HBASE-5225 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-3845-90.patch, HBASE-3845_0.90_1.patch Critical defect. Patch from HBASE-3845 was not integrated to 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5207) Apply HBASE-5155 to trunk
[ https://issues.apache.org/jira/browse/HBASE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5207. --- Resolution: Duplicate Same as HBASE-5206 Apply HBASE-5155 to trunk -- Key: HBASE-5207 URL: https://issues.apache.org/jira/browse/HBASE-5207 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan The issue HBASE-5155 has been fixed on branch(0.90). The same has to be applied on trunk also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5155. --- Resolution: Fixed committed to branch 0.90. ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5192) Backport HBASE-4236 Don't lock the stream while serializing the response
[ https://issues.apache.org/jira/browse/HBASE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5192. --- Resolution: Fixed Hadoop Flags: Reviewed Thanks for the review Ted. Committed to 0.90 Backport HBASE-4236 Don't lock the stream while serializing the response Key: HBASE-5192 URL: https://issues.apache.org/jira/browse/HBASE-5192 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-4236_0.90.patch Backporting to 0.90.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5160) Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time
[ https://issues.apache.org/jira/browse/HBASE-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5160. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.90 Thanks for the review Ted Backport HBASE-4397 - -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time --- Key: HBASE-5160 URL: https://issues.apache.org/jira/browse/HBASE-5160 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5160-AssignmentManager.patch, HBASE-5160_2.patch Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5159) Backport HBASE-4079 - HTableUtil - helper class for loading data
[ https://issues.apache.org/jira/browse/HBASE-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5159. --- Resolution: Fixed Fix Version/s: 0.90.6 Hadoop Flags: Reviewed Backport HBASE-4079 - HTableUtil - helper class for loading data - Key: HBASE-5159 URL: https://issues.apache.org/jira/browse/HBASE-5159 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-4079.patch Backporting to 0.90.6 considering the usefulness of the feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5184) Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness
[ https://issues.apache.org/jira/browse/HBASE-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5184. --- Resolution: Fixed Hadoop Flags: Reviewed Backport HBASE-5152 - Region is on service before completing initialization when doing rollback of split, it will affect read correctness -- Key: HBASE-5184 URL: https://issues.apache.org/jira/browse/HBASE-5184 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5152_0.90.patch Important issue to be merged into 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5168) Backport HBASE-5100 - Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5168. --- Resolution: Fixed Hadoop Flags: Reviewed Backport HBASE-5100 - Rollback of split could cause closed region to be opened again Key: HBASE-5168 URL: https://issues.apache.org/jira/browse/HBASE-5168 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5100_0.90.patch Considering the importance of the defect merging it to 0.90.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5158) Backport HBASE-4878 - Master crash when splitting hlog may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5158. --- Resolution: Fixed Hadoop Flags: Reviewed Backport HBASE-4878 - Master crash when splitting hlog may cause data loss -- Key: HBASE-5158 URL: https://issues.apache.org/jira/browse/HBASE-5158 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-4878_branch90_1.patch Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5178) Backport HBASE-4101 - Regionserver Deadlock
[ https://issues.apache.org/jira/browse/HBASE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5178. --- Resolution: Fixed Backport HBASE-4101 - Regionserver Deadlock --- Key: HBASE-5178 URL: https://issues.apache.org/jira/browse/HBASE-5178 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-4101_0.90_1.patch Critical issue not merged to 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5137. --- Resolution: Fixed Fix Version/s: (was: 0.92.1) 0.92.0 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.6 Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin
[ https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5073. --- Resolution: Fixed Committed to branch hence resolving. Registered listeners not getting removed leading to memory leak in HBaseAdmin - Key: HBASE-5073 URL: https://issues.apache.org/jira/browse/HBASE-5073 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5073.patch HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog tracker. Every time Root node tracker and meta node tracker are started and a listener is registered. But after the operations are performed the listeners are not getting removed. Hence if the admin apis are consistently used then it may lead to memory leak. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4840) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-4840. --- Resolution: Duplicate Duplicate of HBASE-4841 If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4840 URL: https://issues.apache.org/jira/browse/HBASE-4840 Project: HBase Issue Type: Bug Reporter: Alex Newman I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-4585. --- Resolution: Fixed Hadoop Flags: Reviewed Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: hbase-4585-89.patch, hbase-4585-apache-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-4540. --- Resolution: Fixed Fix Version/s: 0.90.5 Resolved both in 0.92 and 0.90.5. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4540_1.patch, HBASE-4540_90.patch, HBASE-4540_90_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4539) OpenedRegionHandler racing with itself in ServerShutDownhandler flow leading to HMaster abort
[ https://issues.apache.org/jira/browse/HBASE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-4539. --- Resolution: Fixed Fixed as part of HBASE-4540 OpenedRegionHandler racing with itself in ServerShutDownhandler flow leading to HMaster abort - Key: HBASE-4539 URL: https://issues.apache.org/jira/browse/HBASE-4539 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Steps to reproduce == - Region R1 is being opened in RS1. -After processing the znode to OPENED RS1 goes down. -Now before the OpenedRegionHandler executor deletes the znode if ServerShutDownHandler tries to assign the region to RS2, RS2 transits the node to OPENED and this OpenedRegionHandler executor deletes the znode. -Now if the first OpenedRegionHandler tries deleting the znode it throws NoNode Exception and causes the HMaster to abort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira