[jira] [Assigned] (HBASE-4028) Hmaster crashes caused by splitting log.
[ https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao reassigned HBASE-4028: - Assignee: gaojinchao Hmaster crashes caused by splitting log. Key: HBASE-4028 URL: https://issues.apache.org/jira/browse/HBASE-4028 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.4 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when one region server crashed. I added some print in function doneWriting and found the values of totalBuffered is negative. 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832release size25168 void doneWriting(RegionEntryBuffer buffer) { synchronized (this) { LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey + buffer.encodedRegionName ); boolean removed = currentlyWriting.remove(buffer.encodedRegionName); assert removed; } long size = buffer.heapSize(); synchronized (dataAvailable) { totalBuffered -= size; LOG.warn(gjc:release Used + totalBuffered ); // We may unblock writers dataAvailable.notifyAll(); } LOG.warn(gjc:release Used + totalBuffered + release size+ size); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.
[ https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4028: -- Attachment: Screenshot-2.png Hmaster crashes caused by splitting log. Key: HBASE-4028 URL: https://issues.apache.org/jira/browse/HBASE-4028 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.4 Attachments: Screenshot-2.png In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when one region server crashed. I added some print in function doneWriting and found the values of totalBuffered is negative. 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832release size25168 void doneWriting(RegionEntryBuffer buffer) { synchronized (this) { LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey + buffer.encodedRegionName ); boolean removed = currentlyWriting.remove(buffer.encodedRegionName); assert removed; } long size = buffer.heapSize(); synchronized (dataAvailable) { totalBuffered -= size; LOG.warn(gjc:release Used + totalBuffered ); // We may unblock writers dataAvailable.notifyAll(); } LOG.warn(gjc:release Used + totalBuffered + release size+ size); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.
[ https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4028: -- Attachment: hbase-root-master-157-5-100-8.rar Hmaster crashes caused by splitting log. Key: HBASE-4028 URL: https://issues.apache.org/jira/browse/HBASE-4028 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.4 Attachments: Screenshot-2.png, hbase-root-master-157-5-100-8.rar In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when one region server crashed. I added some print in function doneWriting and found the values of totalBuffered is negative. 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832release size25168 void doneWriting(RegionEntryBuffer buffer) { synchronized (this) { LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey + buffer.encodedRegionName ); boolean removed = currentlyWriting.remove(buffer.encodedRegionName); assert removed; } long size = buffer.heapSize(); synchronized (dataAvailable) { totalBuffered -= size; LOG.warn(gjc:release Used + totalBuffered ); // We may unblock writers dataAvailable.notifyAll(); } LOG.warn(gjc:release Used + totalBuffered + release size+ size); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.
[ https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4028: -- Attachment: HBASE-4028-0.90V1.patch Hmaster crashes caused by splitting log. Key: HBASE-4028 URL: https://issues.apache.org/jira/browse/HBASE-4028 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.4 Attachments: HBASE-4028-0.90V1.patch, Screenshot-2.png, hbase-root-master-157-5-100-8.rar In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when one region server crashed. I added some print in function doneWriting and found the values of totalBuffered is negative. 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832release size25168 void doneWriting(RegionEntryBuffer buffer) { synchronized (this) { LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey + buffer.encodedRegionName ); boolean removed = currentlyWriting.remove(buffer.encodedRegionName); assert removed; } long size = buffer.heapSize(); synchronized (dataAvailable) { totalBuffered -= size; LOG.warn(gjc:release Used + totalBuffered ); // We may unblock writers dataAvailable.notifyAll(); } LOG.warn(gjc:release Used + totalBuffered + release size+ size); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4028) Hmaster crashes caused by splitting log.
[ https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054385#comment-13054385 ] mingjian commented on HBASE-4028: - gao, do you fix this problem after move totalBuffered += incrHeap; into synchronized (dataAvailable)? Hmaster crashes caused by splitting log. Key: HBASE-4028 URL: https://issues.apache.org/jira/browse/HBASE-4028 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.4 Attachments: HBASE-4028-0.90V1.patch, Screenshot-2.png, hbase-root-master-157-5-100-8.rar In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when one region server crashed. I added some print in function doneWriting and found the values of totalBuffered is negative. 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832release size25168 void doneWriting(RegionEntryBuffer buffer) { synchronized (this) { LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey + buffer.encodedRegionName ); boolean removed = currentlyWriting.remove(buffer.encodedRegionName); assert removed; } long size = buffer.heapSize(); synchronized (dataAvailable) { totalBuffered -= size; LOG.warn(gjc:release Used + totalBuffered ); // We may unblock writers dataAvailable.notifyAll(); } LOG.warn(gjc:release Used + totalBuffered + release size+ size); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4020: - Assignee: Vandana Ayyalasomayajula testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: TestHRegion.patch The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes
[ https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054478#comment-13054478 ] jirapos...@reviews.apache.org commented on HBASE-3939: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/951/ --- (Updated 2011-06-24 14:42:47.298437) Review request for hbase and Todd Lipcon. Changes --- Removed clientVersion check where clientVersion is less than 3. Those clients would encounter the following exception connecting to zookeeper: java.lang.IllegalArgumentException: Not a host:port pair: ciq.com,6,1308866059399 Summary --- A few fixes from Hadoop IPC that we should probably cross-port into our copy: * HADOOP-7227: remove the protocol version check at call time * HADOOP-7146: fix a socket leak in server * HADOOP-7121: fix behavior when response serialization throws an exception * HADOOP-7346: send back nicer error response when client is using an out of date IPC version This addresses bug HBASE-3939. https://issues.apache.org/jira/browse/HBASE-3939 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 1137262 /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 1137262 /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.java 1137262 /src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java 1137280 /src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 1139326 /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/ProtocolSignature.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java 1134732 /src/main/java/org/apache/hadoop/hbase/ipc/Status.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java 1134732 /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1134732 /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1134732 /src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java 1137280 Diff: https://reviews.apache.org/r/951/diff Testing --- Test suite passed. Thanks, Ted Some crossports of Hadoop IPC fixes --- Key: HBASE-3939 URL: https://issues.apache.org/jira/browse/HBASE-3939 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Ted Yu Fix For: 0.92.0 Attachments: 3939-v2.txt, 3939-v3.txt, 3939.txt A few fixes from Hadoop IPC that we should probably cross-port into our copy: - HADOOP-7227: remove the protocol version check at call time - HADOOP-7146: fix a socket leak in server - HADOOP-7121: fix behavior when response serialization throws an exception - HADOOP-7346: send back nicer error response when client is using an out of date IPC version -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3229) Table creation, though using async call to master, can actually run for a while and cause RPC timeout
[ https://issues.apache.org/jira/browse/HBASE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054499#comment-13054499 ] Ted Yu commented on HBASE-3229: --- w.r.t. Kannan's comments. In TRUNK, the following method of HMaster is async - see the third parameter: {code} public void createTable(HTableDescriptor desc, byte [][] splitKeys) throws IOException { createTable(desc, splitKeys, false); } {code} It is the only method exposed through HMasterInterface. patch v5 from HBASE-3904 makes HBaseAdmin.createTable() to wait for all regions to be online. Table creation, though using async call to master, can actually run for a while and cause RPC timeout --- Key: HBASE-3229 URL: https://issues.apache.org/jira/browse/HBASE-3229 Project: HBase Issue Type: Bug Components: client, master Affects Versions: 0.90.0 Reporter: Jonathan Gray Priority: Critical Fix For: 0.92.0 Our create table methods in HBaseAdmin are synchronous from client POV. However, underneath, we're using an async create and then looping waiting for table availability. Because the create is async and we loop instead of block on RPC, we don't expect RPC timeouts. However, when creating a table with lots of initial regions, the async create can actually take a long time (more than 30 seconds in this case) which causes the client to timeout and gives impression something failed. We should make the create truly async so that this can't happen. And rather than doing one-off, inline assignment as it is today, we should reuse the fancy enable/disable code stack just added to make this faster and more optimal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4010) HMaster.createTable could be heavily optimized
[ https://issues.apache.org/jira/browse/HBASE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4010: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) HMaster.createTable could be heavily optimized -- Key: HBASE-4010 URL: https://issues.apache.org/jira/browse/HBASE-4010 Project: HBase Issue Type: Improvement Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4010-0.90.txt, 4010-v2.txt, 4010-v3.txt, 4010-v5.txt Looking at the createTable method in HMaster (the one that's private), we seem to be very inefficient: - We set the enabled flag for the table for every region (should be done only once). - Every time we create a new region we create a new HLog and then close it (reuse one instead or see if it's really necessary). - We do one RPC to .META. per region (we should batch put). This should provide drastic speedups even for those creating tables with just 50 regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4024: - Assignee: Ted Yu Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4029) Inappropriate checkin of Logging Mode in HRegionServer
Inappropriate checkin of Logging Mode in HRegionServer -- Key: HBASE-4029 URL: https://issues.apache.org/jira/browse/HBASE-4029 Project: HBase Issue Type: Bug Reporter: Akash Ashok There is a condition check for Debug mode logging in HRegionServer.java . Because of this the region server never closes the META region while stopping hbase and thus never stops, if DEBUG mode is not enable in logging. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Status: Patch Available (was: Open) Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: 4024.txt Added filesToCompact.size() check in isMajorCompaction() TestCompaction and TestCompactSelection pass. Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4029: -- Fix Version/s: 0.92.0 Summary: Inappropriate checking of Logging Mode in HRegionServer (was: Inappropriate checkin of Logging Mode in HRegionServer) Inappropriate checking of Logging Mode in HRegionServer --- Key: HBASE-4029 URL: https://issues.apache.org/jira/browse/HBASE-4029 Project: HBase Issue Type: Bug Reporter: Akash Ashok Labels: regionserver Fix For: 0.92.0 There is a condition check for Debug mode logging in HRegionServer.java . Because of this the region server never closes the META region while stopping hbase and thus never stops, if DEBUG mode is not enable in logging. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3891) TaskMonitor is used wrong in some places
[ https://issues.apache.org/jira/browse/HBASE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054543#comment-13054543 ] Ted Yu commented on HBASE-3891: --- HRegion.compact() keeps a reference to the proxy returned by TaskMonitor.get().createStatus() If MonitoredTaskImpl@51bfa303 corresponds to this proxy, I don't know why weakProxy.get() returned null. TaskMonitor is used wrong in some places Key: HBASE-3891 URL: https://issues.apache.org/jira/browse/HBASE-3891 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars George Fix For: 0.92.0 I have a long running log replay in progress but none of the updates show. This is caused by reusing the MonitorTask references wrong, and manifests itself like this in the logs: {noformat} 2011-05-16 15:22:18,127 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@51bfa303 appears to have been leaked 2011-05-16 15:22:18,128 DEBUG org.apache.hadoop.hbase.monitoring.MonitoredTask: cleanup. {noformat} The cleanup sets the completion timestamp and causes the task to be purged from the list. After that the UI for example does not show any further running tasks, although from the logs I can see (with my log additions): {noformat} 2011-05-16 15:29:52,296 DEBUG org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Compaction complete: 103.1m in 18542ms 2011-05-16 15:29:52,296 DEBUG org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Running coprocessor post-compact hooks 2011-05-16 15:29:52,296 DEBUG org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Compaction complete 2011-05-16 15:29:52,297 DEBUG org.apache.hadoop.hbase.monitoring.MonitoredTask: markComplete: Compaction complete {noformat} They are silently ignored as the TaskMonitor has dropped their reference. We need to figure out why a supposedly completed task monitor was reused. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException
LoadIncrementalHFiles fails with FileNotFoundException -- Key: HBASE-4030 URL: https://issues.apache.org/jira/browse/HBASE-4030 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Adam Phelps Original Message Subject:Re: LoadIncrementalHFiles bug when regionserver fails to access file? Date: Thu, 23 Jun 2011 17:00:04 -0700 From: Ted Yu yuzhih...@gmail.com Reply-To: cdh-u...@cloudera.org To: u...@hbase.apache.org CC: CDH Users cdh-u...@cloudera.org This is due to the handling of HFile.Reader being wrapped in a try-finally block. However, there is no check as to whether the reader operation encounters any exception which should determine what to do next. Please file a JIRA. Thanks Adam. On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com mailto:a...@opendns.com wrote: (As a note, this is with CDH3u0 which is based on HBase 0.90.1) We've been seeing intermittent failures of calls to LoadIncrementalHFiles. When this happens the node that made the call will see a FileNotFoundException such as this: 2011-06-23 15:47:34.379566500 java.net http://java.net.__SocketTimeoutException: Call to s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on socket timeout exception: java.net.SocketTi meoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.__SocketChannel[connected local=/67.215.90.51:51605 http://67.215.90.51:51605 remo te=s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020] 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/__domainsranked/TopDomainsRan k.r3v5PRvK/handling/__3557032074765091256 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__openInfo(DFSClient.java:1602) 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__init(DFSClient.java:1593) Over on the regionserver that was loading this we see that it attempted to load and hit a 60 second timeout: 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 for inclusion in store handling region domainsranked,368449:2011/0/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b 8. ... 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.__DFSClient: Failed to connect to /67.215.90.38:50010 http://67.215.90.38:50010, add to deadNodes and continue java.net http://java.net.__SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.__SocketChannel[connected local=/67.215.90.38:42199 http://67.215.90.38:42199 remote=/67.215.90.38:50010 http://67.215.90.38:50010] at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketIOWithTimeout.doIO(__SocketIOWithTimeout.java:164) at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:155) at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:128) at java.io.BufferedInputStream.__fill(BufferedInputStream.java:__218) at java.io.BufferedInputStream.__read(BufferedInputStream.java:__237) at java.io.DataInputStream.__readShort(DataInputStream.__java:295) We suspect this particular problem is a resource contention issue on our side. However, the loading process proceeds to rename the file despite the failure: 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.__regionserver.Store: Renaming bulk load file hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 to hdfs://namenode.XXX:8020/__hbase/domainsranked/__d4925aca7852bed32613a509215d42__b8/handling/__3615917062821145533 And then the LoadIncrementalHFiles tries to load the hfile again: 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 for inclusion in store handling region domainsranked,368449:2011/05/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b8. 2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.__HBaseServer: IPC Server handler 147 on 60020, call
[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners
[ https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054544#comment-13054544 ] Jean-Daniel Cryans commented on HBASE-3852: --- I'm not sure it fixes the problem, since the issue here is about users not getting back to close even tho there's still more data. We patched in something at SU that we've been running for more than a month, but it is specific to our usecase. Anyways, feel free to inspire yourself with those two commits: https://github.com/stumbleupon/hbase/commit/7bcc63ee22f7e0218fb7225f387ffc0bd2279f59#src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java https://github.com/stumbleupon/hbase/commit/3d08cc8ad5abfec11ee0340d5a75a2b308854e17#src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java ThriftServer leaks scanners --- Key: HBASE-3852 URL: https://issues.apache.org/jira/browse/HBASE-3852 Project: HBase Issue Type: Bug Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 3852.txt The scannerMap in ThriftServer relies on the user to clean it by closing the scanner. If that doesn't happen, the ResultScanner will stay in the thrift server's memory and if any pre-fetching was done, it will also start accumulating Results (with all their data). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054553#comment-13054553 ] Jean-Daniel Cryans commented on HBASE-4029: --- Would you mind contributing a fix? Inappropriate checking of Logging Mode in HRegionServer --- Key: HBASE-4029 URL: https://issues.apache.org/jira/browse/HBASE-4029 Project: HBase Issue Type: Bug Reporter: Akash Ashok Labels: regionserver Fix For: 0.92.0 There is a condition check for Debug mode logging in HRegionServer.java . Because of this the region server never closes the META region while stopping hbase and thus never stops, if DEBUG mode is not enable in logging. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash Ashok reassigned HBASE-4029: -- Assignee: Akash Ashok Inappropriate checking of Logging Mode in HRegionServer --- Key: HBASE-4029 URL: https://issues.apache.org/jira/browse/HBASE-4029 Project: HBase Issue Type: Bug Reporter: Akash Ashok Assignee: Akash Ashok Labels: regionserver Fix For: 0.92.0 There is a condition check for Debug mode logging in HRegionServer.java . Because of this the region server never closes the META region while stopping hbase and thus never stops, if DEBUG mode is not enable in logging. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054557#comment-13054557 ] Akash Ashok commented on HBASE-4029: Sure. I am workin on it. Inappropriate checking of Logging Mode in HRegionServer --- Key: HBASE-4029 URL: https://issues.apache.org/jira/browse/HBASE-4029 Project: HBase Issue Type: Bug Reporter: Akash Ashok Assignee: Akash Ashok Labels: regionserver Fix For: 0.92.0 There is a condition check for Debug mode logging in HRegionServer.java . Because of this the region server never closes the META region while stopping hbase and thus never stops, if DEBUG mode is not enable in logging. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054558#comment-13054558 ] Jean-Daniel Cryans commented on HBASE-4024: --- Regarding the last edit in the patch, I don't think you should remove that whitespace and what would be the reason to keep the check on the number of files there if it's moved into isMajorCompaction? Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054562#comment-13054562 ] Ted Yu commented on HBASE-4024: --- I can revert the whitespace change. The check on the number of files was kept for the case when forcemajor is true. Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners
[ https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054565#comment-13054565 ] Ted Yu commented on HBASE-3852: --- I was thinking about timeout-driven expiration policy. SU's implementation looks nice. We can introduce a new parameter (sigh) to control whether ScannerCleaner is instantiated at startup. @J-D: do you happen to know how many scanners were closed by ScannerCleaner in the past month ? ThriftServer leaks scanners --- Key: HBASE-3852 URL: https://issues.apache.org/jira/browse/HBASE-3852 Project: HBase Issue Type: Bug Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 3852.txt The scannerMap in ThriftServer relies on the user to clean it by closing the scanner. If that doesn't happen, the ResultScanner will stay in the thrift server's memory and if any pre-fetching was done, it will also start accumulating Results (with all their data). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-4020: Attachment: HBASE-4020.txt testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: HBASE-4020.txt, TestHRegion.patch The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: (was: 4024.txt) Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: 4024.txt Removed the whitespace adjustment. Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: (was: 4024.txt) Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: 4024.txt Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descript
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subbu M Iyer updated HBASE-4025: Attachment: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4030: -- Fix Version/s: 0.90.4 LoadIncrementalHFiles fails with FileNotFoundException -- Key: HBASE-4030 URL: https://issues.apache.org/jira/browse/HBASE-4030 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Adam Phelps Fix For: 0.90.4 Original Message Subject: Re: LoadIncrementalHFiles bug when regionserver fails to access file? Date: Thu, 23 Jun 2011 17:00:04 -0700 From: Ted Yu yuzhih...@gmail.com Reply-To: cdh-u...@cloudera.org To: u...@hbase.apache.org CC: CDH Users cdh-u...@cloudera.org This is due to the handling of HFile.Reader being wrapped in a try-finally block. However, there is no check as to whether the reader operation encounters any exception which should determine what to do next. Please file a JIRA. Thanks Adam. On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com mailto:a...@opendns.com wrote: (As a note, this is with CDH3u0 which is based on HBase 0.90.1) We've been seeing intermittent failures of calls to LoadIncrementalHFiles. When this happens the node that made the call will see a FileNotFoundException such as this: 2011-06-23 15:47:34.379566500 java.net http://java.net.__SocketTimeoutException: Call to s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on socket timeout exception: java.net.SocketTi meoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.__SocketChannel[connected local=/67.215.90.51:51605 http://67.215.90.51:51605 remo te=s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020] 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/__domainsranked/TopDomainsRan k.r3v5PRvK/handling/__3557032074765091256 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__openInfo(DFSClient.java:1602) 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__init(DFSClient.java:1593) Over on the regionserver that was loading this we see that it attempted to load and hit a 60 second timeout: 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 for inclusion in store handling region domainsranked,368449:2011/0/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b 8. ... 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.__DFSClient: Failed to connect to /67.215.90.38:50010 http://67.215.90.38:50010, add to deadNodes and continue java.net http://java.net.__SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.__SocketChannel[connected local=/67.215.90.38:42199 http://67.215.90.38:42199 remote=/67.215.90.38:50010 http://67.215.90.38:50010] at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketIOWithTimeout.doIO(__SocketIOWithTimeout.java:164) at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:155) at org.apache.hadoop.net http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:128) at java.io.BufferedInputStream.__fill(BufferedInputStream.java:__218) at java.io.BufferedInputStream.__read(BufferedInputStream.java:__237) at java.io.DataInputStream.__readShort(DataInputStream.__java:295) We suspect this particular problem is a resource contention issue on our side. However, the loading process proceeds to rename the file despite the failure: 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.__regionserver.Store: Renaming bulk load file hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 to hdfs://namenode.XXX:8020/__hbase/domainsranked/__d4925aca7852bed32613a509215d42__b8/handling/__3615917062821145533 And then the LoadIncrementalHFiles tries to load the hfile again: 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256 for inclusion in store handling region
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054590#comment-13054590 ] Subbu M Iyer commented on HBASE-4025: - May be we should create all tables under /hbase/tables/table name instead of /hbase/table name so that we can avoid future cases where we create other folder under /hbase such as .logs,.corrupt et al that does not contain table descriptors? So, this pattern will have some thing like: /hbase/.logs /hbase/.corrupt /hbase/.oldlogs /hbase/.META. /hbase/-ROOT- /hbase/future non user system folders /hbase/UserTables/user table folder/.tableinfo and when we need to retrieve all the table descriptors we simply iterate over the /hbase/UserTables folder rather than the /hbase and ignore all system folders. Other option would be: /hbase/System/.logs, .oldlogs, .corrupt et al. /hbase/UserTables/user tables This way we can avoid adding a band-aid fix to this read table descriptor logic every time we have a new system folder. thoughts? Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-4020: Attachment: HBASE-4020.txt testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: HBASE-4020.txt The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-4020: Attachment: (was: TestHRegion.patch) testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: HBASE-4020.txt The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-4020: Attachment: (was: HBASE-4020.txt) testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: HBASE-4020.txt The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054599#comment-13054599 ] Ted Yu commented on HBASE-4025: --- A third option :-) /hbase/System/.logs, .oldlogs, .corrupt et al. /hbase/user tables Users/developers are used to the current hdfs structure. This would introduce relatively small impact to existing user tables. Let's see what other developers think. Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Phelps updated HBASE-4030: --- Description: -- We've been seeing intermittent failures of calls to LoadIncrementalHFiles. When this happens the node that made the call will see a FileNotFoundException such as this: 2011-06-23 15:47:34.379566500 java.net.SocketTimeoutException: Call to s8.XXX/67.215.90.38:60020 failed on socket timeout exception: java.net.SocketTi meoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/67.215.90.51:51605 remo te=s8.XXX/67.215.90.38:60020] 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/domainsranked/TopDomainsRan k.r3v5PRvK/handling/3557032074765091256 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602) 2011-06-23 15:47:34.379573500 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1593) -- Over on the regionserver that was loading this we see that it attempted to load and hit a 60 second timeout: 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256 for inclusion in store handling region domainsranked,368449:2011/0/03/23:category::com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b 8. ... 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /67.215.90.38:50010, add to deadNodes and continue java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/67.215.90.38:42199 remote=/67.215.90.38:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readShort(DataInputStream.java:295) -- We suspect this particular problem is a resource contention issue on our side. However, the loading process proceeds to rename the file despite the failure: 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming bulk load file hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256 to hdfs://namenode.XXX:8020/hbase/domainsranked/d4925aca7852bed32613a509215d42b8/handling/3615917062821145533 -- And then the LoadIncrementalHFiles tries to load the hfile again: 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.regionserver.Store: Validating hfile at hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256 for inclusion in store handling region domainsranked,368449:2011/05/03/23:category::com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b8. 2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.HBaseServer: IPC Server handler 147 on 60020, call bulkLoadHFile(hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256, [B@4224508b, [B@5e23f799) from 67.215.90.51:51856: error: java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256 -- This eventually leads to the load command failing. was: Original Message Subject:Re: LoadIncrementalHFiles bug when regionserver fails to access file? Date: Thu, 23 Jun 2011 17:00:04 -0700 From: Ted Yu yuzhih...@gmail.com Reply-To: cdh-u...@cloudera.org To: u...@hbase.apache.org CC: CDH Users cdh-u...@cloudera.org This is due to the handling of HFile.Reader being wrapped in a try-finally block. However, there is no check as to whether the reader operation encounters any exception which should determine what to do next. Please file a JIRA. Thanks Adam. On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com mailto:a...@opendns.com wrote: (As a note, this is with CDH3u0 which is based on HBase 0.90.1) We've been seeing intermittent failures of calls to LoadIncrementalHFiles. When this happens the node that made the call will see a FileNotFoundException such as this: 2011-06-23 15:47:34.379566500 java.net http://java.net.__SocketTimeoutException: Call to s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on socket timeout exception: java.net.SocketTi meoutException: 6
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054610#comment-13054610 ] Ted Yu commented on HBASE-4025: --- Overall +1 I am running test suite. Minor comment, maybe there is a better place for hbaseNonTableDirs but I don't have strong opinion. In HConstants, all constants are in upper case. How about renaming it to HBASE_NON_USER_TABLE_DIRS ? Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descrip
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4025: - Assignee: Subbu M Iyer Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners
[ https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054616#comment-13054616 ] Jean-Daniel Cryans commented on HBASE-3852: --- bq. do you happen to know how many scanners were closed by ScannerCleaner in the past month ? I used to print that out but it was really spammy. Probably tens of thousands. ThriftServer leaks scanners --- Key: HBASE-3852 URL: https://issues.apache.org/jira/browse/HBASE-3852 Project: HBase Issue Type: Bug Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 3852.txt The scannerMap in ThriftServer relies on the user to clean it by closing the scanner. If that doesn't happen, the ResultScanner will stay in the thrift server's memory and if any pre-fetching was done, it will also start accumulating Results (with all their data). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054618#comment-13054618 ] Jean-Daniel Cryans commented on HBASE-4024: --- I'm starting to think that the check should be refactored out of the ifs and put right at the beginning, and then maybe print a nice message on why it's skipping? Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered
[ https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4024: -- Attachment: 4024-v2.txt Allow me to reformat a portion of existing code in v2. Major compaction may not be triggered, even though region server log says it is triggered - Key: HBASE-4024 URL: https://issues.apache.org/jira/browse/HBASE-4024 Project: HBase Issue Type: Bug Components: regionserver Reporter: Suraj Varma Assignee: Ted Yu Priority: Trivial Labels: newbie Fix For: 0.92.0 Attachments: 4024-v2.txt, 4024.txt The trunk version of regionserver/Store.java, method ListStoreFile compactSelection(ListStoreFile candidates) has this code to determine whether major compaction should be done or not: // major compact on user action or age (caveat: we have too many files) boolean majorcompaction = (forcemajor || isMajorCompaction(filesToCompact)) filesToCompact.size() this.maxFilesToCompact; The isMajorCompaction(filesToCompact) method internally determines whether or not major compaction is required (and logs this as Major compaction triggered ... log message. However, after the call, the compactSelection method subsequently applies the filesToCompact.size() this.maxFilesToCompact check which can turn off major compaction. This would result in a Major compaction triggered log message without actually triggering a major compaction. The filesToCompact.size() check should probably be moved inside the isMajorCompaction(filesToCompact) method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-451: -- TestFSTableDescriptors.testHTableDescriptors has been broken since revision 1138120 (HBASE-451 Remove HTableDescriptor from HRegionInfo -- part 2, some cleanup) Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054639#comment-13054639 ] Jonathan Gray commented on HBASE-4027: -- In the new HFile v2 over in HBASE-3857 the block cache interface changes from ByteBuffer to HeapSize. So you can now put anything you want into the cache that implements HeapSize (there is a new HFileBlock that is used in HFile v2). One big question is whether you're going to make copies out of the direct byte buffers on each read of that block, or if you're going to change KeyValue to use the ByteBuffer interface (or some other) instead of the byte[] directly. With a DBB you can't get access to an underlying byte[]. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Priority: Minor Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054660#comment-13054660 ] Li Pi commented on HBASE-4027: -- This would be really useful. I think even making copies out of the direct byte buffers would confer a substantial performance advantage over the FS cache. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Priority: Minor Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054685#comment-13054685 ] Subbu M Iyer commented on HBASE-4025: - Agree regarding the naming of variable to HBASE_NON_USER_TABLE_DIRS and as far as where it should go, I don't have a strong opinion either. As far as other issue is concerned, we can go either way as long as we have unique way of identifying all the user tables. Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690 ] Ted Yu commented on HBASE-451: -- Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable which TestTableMapReduce inherits: {code} HRegion region = createNewHRegion(desc, startKey, endKey); {code} I added test of whether table descriptor exists on HDFS in HRegion.createHRegion(). If it doesn't exist, I call FSUtils.createTableDescriptor(). Now TestTableMapReduce passes. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690 ] Ted Yu edited comment on HBASE-451 at 6/24/11 9:36 PM: --- Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable which TestTableMapReduce inherits: {code} HRegion region = createNewHRegion(desc, startKey, endKey); {code} I added test of whether table descriptor exists on HDFS in HRegion.createHRegion(). If it doesn't exist, I call FSUtils.createTableDescriptor(). Now TestTableMapReduce and TestFSTableDescriptors both pass. was (Author: yuzhih...@gmail.com): Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable which TestTableMapReduce inherits: {code} HRegion region = createNewHRegion(desc, startKey, endKey); {code} I added test of whether table descriptor exists on HDFS in HRegion.createHRegion(). If it doesn't exist, I call FSUtils.createTableDescriptor(). Now TestTableMapReduce passes. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-451: - Attachment: 451-addendum.txt Here is my addendum. There could be a cleaner way of detecting that table descriptor doesn't exist on HDFS. For the moment, I rely on TableExistsException. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690 ] Ted Yu edited comment on HBASE-451 at 6/24/11 9:46 PM: --- Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable which TestTableMapReduce inherits: {code} HRegion region = createNewHRegion(desc, startKey, endKey); {code} I added test of whether table descriptor exists on HDFS in HRegion.createHRegion(). If it doesn't exist, I call FSUtils.createTableDescriptor(). Now TestTableMapReduce and TestFSTableDescriptors both pass. TestDistributedLogSplitting and TestSplitTransactionOnCluster pass on my laptop as well. was (Author: yuzhih...@gmail.com): Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable which TestTableMapReduce inherits: {code} HRegion region = createNewHRegion(desc, startKey, endKey); {code} I added test of whether table descriptor exists on HDFS in HRegion.createHRegion(). If it doesn't exist, I call FSUtils.createTableDescriptor(). Now TestTableMapReduce and TestFSTableDescriptors both pass. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054712#comment-13054712 ] Ted Yu commented on HBASE-451: -- TestDistributedLogSplitting hung on Linux. The second time I ran it on my laptop, I got: {code} Failed tests: testThreeRSAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) {code} The first exception in output file was: {code} 2011-06-24 15:01:37,115 WARN [PostOpenDeployTasks:1028785192] handler.OpenRegionHandler$PostOpenDeployTasksThread(221): Exception running postOpenDeployTasks; region=1028785192 java.io.IOException: No server for -ROOT- at org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaLocation(MetaEditor.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1405) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:218) {code} Although there is no such exception in output on Linux. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054714#comment-13054714 ] Gary Helmling commented on HBASE-451: - @Ted, good digging. I'm not very familiar with these changes, but it looks to me like the changes so far have tried to pull HTableDescription handling out of HRegion. So adding it back in to HRegion.createHRegion() may be a step back. I think I'd opt for trying to fix the tests to call FSUtils.createTableDescriptor() instead. Either in TestTableMapReduce.init or MultiRegionTable.preHBaseClusterSetup(), prior to creating the table regions. I think either of those would work. I wonder how many other HBaseTestCase subclasses may have problems as well. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4020: -- Attachment: 4020-ted.txt I would apply this patch. testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: 4020-ted.txt, HBASE-4020.txt The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054722#comment-13054722 ] Ted Yu commented on HBASE-451: -- Thanks Gary for the reminder. I will upload addendum version 2 which creates table descriptor in MultiRegionTable.preHBaseClusterSetup(). TestTableMapReduce passes: {code} Running org.apache.hadoop.hbase.mapred.TestTableMapReduce Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.72 sec Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 83.983 sec {code} Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-451: - Attachment: (was: 451-addendum.txt) Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-451: - Attachment: 451-addendum-v2.txt Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054729#comment-13054729 ] Gary Helmling commented on HBASE-451: - @Ted, +1 from me on addendum v2, since the tests now pass. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054737#comment-13054737 ] Jason Rutherglen commented on HBASE-4027: - {quote}One big question is whether you're going to make copies out of the direct byte buffers on each read of that block, or if you're going to change KeyValue to use the ByteBuffer interface (or some other) instead of the byte[] directly{quote} Right the {{HFile.Scanner.getKeyValue()}} method is calling {{block.array()}}. We'd need to track down all {{byte[]}} references, and convert them to {{ByteBuffer}}. That's more of a separate Jira. I think converting a direct ByteBuffer to byte[] will generate a fair amount of garbage, though of a different (smaller and more numerous) kind than the blocks. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Priority: Minor Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054740#comment-13054740 ] Gary Helmling commented on HBASE-4025: -- Since we already have rules for valid user table names, why not just apply those in the directory listing? User tables are not allowed to start with '.' or '-', so ignore directory entries beginning with those. Special case '.META.' and '-ROOT-', since that's what we do most places for those 2 tables anyway. We already generally are following a convention of system directories starting with '.', so this seems sufficient to me. No need to move anything around. Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri
[ https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054742#comment-13054742 ] Gary Helmling commented on HBASE-4025: -- (repeating above but with formatting fixed)... Since we already have rules for valid user table names, why not just apply those in the directory listing? User tables are not allowed to start with '.' or '\-', so ignore directory entries beginning with those. Special case '.META.' and '\-ROOT\-', since that's what we do most places for those 2 tables anyway. We already generally are following a convention of system directories starting with '.', so this seems sufficient to me. No need to move anything around. Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descriptors -- Key: HBASE-4025 URL: https://issues.apache.org/jira/browse/HBASE-4025 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch Original Estimate: 2h Remaining Estimate: 2h 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears to have been leaked 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed getting all descriptors java.io.FileNotFoundException: No status for hdfs://ciq.com:9000/hbase/.corrupt at org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054747#comment-13054747 ] Jason Rutherglen commented on HBASE-4027: - {quote}This would be really useful. I think even making copies out of the direct byte buffers would confer a substantial performance advantage over the FS cache.{quote} The filesystem cache doesn't help because HBase needs quick access to uncompressed blocks for scanning. For what duration does an uncompressed block need to be cached? In either case, accessing compressed blocks from the FS cache will be faster than hitting the disk or network. I am guessing one can maintain a small'ish block cache, ensure HDFS blocks are local, provide extra space for the FS cache, and gain in read throughput. Snappy should decompress fast enough for this to be more viable than maintaining a large-ish block cache. The problem [today] with a small'ish block cache is the GC is driven mad. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Priority: Minor Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054756#comment-13054756 ] Ted Yu commented on HBASE-451: -- On Linux, TestFSTableDescriptors fails at the following assertion: {code} assertEquals(count * 2, htds.cachehits); {code} The error was: {code} java.lang.AssertionError: expected:20 but was:30 {code} Stack should know how to fix the above. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.
[ https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054759#comment-13054759 ] Ted Yu commented on HBASE-4020: --- Integrated to TRUNK. Thanks for the patch Vandana. testWritesWhileGetting unit test needs to be fixed. -- Key: HBASE-4020 URL: https://issues.apache.org/jira/browse/HBASE-4020 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.3 Environment: OS: RHEL 5.4 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.92.0 Attachments: 4020-ted.txt, HBASE-4020.txt The unit test testWritesWhileGetting in the org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. It is current using the table name and method name for initializing a HRegion as testWritesWhileScanning. It should be testWritesWhileGetting. Due to this, the test fails as the initHRegion method fails in creating a new HRegion for the test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3810) Registering a Coprocessor at HTableDescriptor should be less strict
[ https://issues.apache.org/jira/browse/HBASE-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingjie Lai reassigned HBASE-3810: -- Assignee: Mingjie Lai Registering a Coprocessor at HTableDescriptor should be less strict --- Key: HBASE-3810 URL: https://issues.apache.org/jira/browse/HBASE-3810 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Environment: all Reporter: Joerg Schad Assignee: Mingjie Lai Priority: Minor Original Estimate: 2h Remaining Estimate: 2h Registering a Copressor in the following way will fail as the Coprocessor$1 keyword is case sensitive (instead COPROCESSOR$1 works fine). Removing this restriction would improve usability. HTableDescriptor desc = new HTableDescriptor(tName); desc.setValue(Coprocessor$1, path.toString() + : + full_class_name + : + Coprocessor.Priority.USER); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054772#comment-13054772 ] Ted Yu commented on HBASE-451: -- I committed addendum v2 into TRUNK. Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4031) An imbalance result calculated by LoadBalancer
An imbalance result calculated by LoadBalancer -- Key: HBASE-4031 URL: https://issues.apache.org/jira/browse/HBASE-4031 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One node's regions count is the double of the other nodes. And it didn't move regions anymore: Address Start Code Load 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165 Total: servers: 4 requests=0, regions=13689 HBASE-3985-Same Region could be picked out twice in LoadBalancer was found by my analysis on this problem. But I'm afraid it's not the main cause of the problem. There's one active master, one standby master, four regionservers in our cluster. 10:57:41, the standby hamster 222 becomes the active one. 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover 4 regionservers was registered in 222 one by one. Only one regionserver seemed some time late. 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false 13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster during the time). 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 13134 regions in transition All the 13134 regions were opened, regions opened count in each server: 158-1-101-222,20020,1306205940117Count: 834 158-1-101-82,20020,1306205415714Count: 4093 158-1-101-202,20020,1306205409671Count: 4118 158-1-101-52,20020,1306205417261Count: 4089 The nearest balancer calculate results: 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers 5012 is an unimaginable number here, for it is larger than the average number 3424.5 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3516) Coprocessors: add test cases for loading coprocessor jars from hdfs and local fs.
[ https://issues.apache.org/jira/browse/HBASE-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054795#comment-13054795 ] jirapos...@reviews.apache.org commented on HBASE-3516: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/963/ --- Review request for hbase. Summary --- There is no test case for testing coprocesors class loading from hdfs or local file system. Add test cases for cp class loading. It does: - compile a cp implementation on the fly by Java compiler api - build a jar file from the compiled classes - copy the jar to local file system or hdfs so it can be loaded for a region This addresses bug HBase-3516. https://issues.apache.org/jira/browse/HBase-3516 Diffs - src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 0a1fb2a src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java PRE-CREATION Diff: https://reviews.apache.org/r/963/diff Testing --- Thanks, Mingjie Coprocessors: add test cases for loading coprocessor jars from hdfs and local fs. -- Key: HBASE-3516 URL: https://issues.apache.org/jira/browse/HBASE-3516 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.90.0 Reporter: Mingjie Lai Assignee: Mingjie Lai Loading coprocessors classes from jar files (at hdfs or local fs) is supported from CP framework right now. We used to have a test case to cover this scenario which uses an base-64 encoded string at the test case to represent a compiled jar file. This hardcoded way was not acceptable as a valid test case, so we removed it eventually. We need to have a better way to redo this case. Option 1) modify maven file in order to compile a test cp class into jar, and put it to hdfs and local fs, and run the cp class loading test; option 2) use Java 6.0 Compiler API to compile the test case at runtime and create the jar file? Need more time to investigate which one is better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4031) An imbalance result calculated by LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-4031: Attachment: HMaster222.rar An imbalance result calculated by LoadBalancer -- Key: HBASE-4031 URL: https://issues.apache.org/jira/browse/HBASE-4031 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 Attachments: HMaster222.rar I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One node's regions count is the double of the other nodes. And it didn't move regions anymore: Address Start Code Load 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165 Total: servers: 4 requests=0, regions=13689 HBASE-3985-Same Region could be picked out twice in LoadBalancer was found by my analysis on this problem. But I'm afraid it's not the main cause of the problem. There's one active master, one standby master, four regionservers in our cluster. 10:57:41, the standby hamster 222 becomes the active one. 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover 4 regionservers was registered in 222 one by one. Only one regionserver seemed some time late. 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false 13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster during the time). 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 13134 regions in transition All the 13134 regions were opened, regions opened count in each server: 158-1-101-222,20020,1306205940117Count: 834 158-1-101-82,20020,1306205415714Count: 4093 158-1-101-202,20020,1306205409671Count: 4118 158-1-101-52,20020,1306205417261Count: 4089 The nearest balancer calculate results: 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers 5012 is an unimaginable number here, for it is larger than the average number 3424.5 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4031) An imbalance result calculated by LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-4031: Attachment: HRegionServer222.rar An imbalance result calculated by LoadBalancer -- Key: HBASE-4031 URL: https://issues.apache.org/jira/browse/HBASE-4031 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 Attachments: HMaster222.rar, HRegionServer222.rar I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One node's regions count is the double of the other nodes. And it didn't move regions anymore: Address Start Code Load 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165 Total: servers: 4 requests=0, regions=13689 HBASE-3985-Same Region could be picked out twice in LoadBalancer was found by my analysis on this problem. But I'm afraid it's not the main cause of the problem. There's one active master, one standby master, four regionservers in our cluster. 10:57:41, the standby hamster 222 becomes the active one. 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover 4 regionservers was registered in 222 one by one. Only one regionserver seemed some time late. 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false 13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster during the time). 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 13134 regions in transition All the 13134 regions were opened, regions opened count in each server: 158-1-101-222,20020,1306205940117Count: 834 158-1-101-82,20020,1306205415714Count: 4093 158-1-101-202,20020,1306205409671Count: 4118 158-1-101-52,20020,1306205417261Count: 4089 The nearest balancer calculate results: 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers 5012 is an unimaginable number here, for it is larger than the average number 3424.5 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4031) An imbalance result calculated by LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054797#comment-13054797 ] Jieshan Bean commented on HBASE-4031: - For the original log fils is too big, I just attached some fragments of the full logs. An imbalance result calculated by LoadBalancer -- Key: HBASE-4031 URL: https://issues.apache.org/jira/browse/HBASE-4031 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 Attachments: HMaster222.rar, HRegionServer222.rar I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One node's regions count is the double of the other nodes. And it didn't move regions anymore: Address Start Code Load 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165 Total: servers: 4 requests=0, regions=13689 HBASE-3985-Same Region could be picked out twice in LoadBalancer was found by my analysis on this problem. But I'm afraid it's not the main cause of the problem. There's one active master, one standby master, four regionservers in our cluster. 10:57:41, the standby hamster 222 becomes the active one. 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover 4 regionservers was registered in 222 one by one. Only one regionserver seemed some time late. 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false 13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster during the time). 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 13134 regions in transition All the 13134 regions were opened, regions opened count in each server: 158-1-101-222,20020,1306205940117Count: 834 158-1-101-82,20020,1306205415714Count: 4093 158-1-101-202,20020,1306205409671Count: 4118 158-1-101-52,20020,1306205417261Count: 4089 The nearest balancer calculate results: 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers 5012 is an unimaginable number here, for it is larger than the average number 3424.5 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4033) The shutdown RegionServer could be added to AssignmentManager.servers again
The shutdown RegionServer could be added to AssignmentManager.servers again --- Key: HBASE-4033 URL: https://issues.apache.org/jira/browse/HBASE-4033 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 The folling steps can easily recreate the problem: 1. There's thousands of regions in the cluster. 2. Stop the cluster. 3. Start the cluster. Killing one regionserver while the regions were opening. Restarted it after 10 seconds. The shutted regionserver will appear in the AssignmentManager.servers list again. For example: Issue 1: 2011-06-23 14:14:30,775 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2220, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2205, 167-6-1-13,20020,1308803514394=2183 Two regionservers(One of it had aborted) had the same hostname but different startcode: 167-6-1-13,20020,1308803391742=2374 167-6-1-13,20020,1308803514394=2183 Issue 2: (1).The Rs 167-6-1-11,20020,1308105402003 finished shutdown at 10:46:37,774: 10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003 (2).Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions: 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) (3).Region was assigned to this dead RS again at 10:50:20,671: 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,0805861380030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4033) The shutdown RegionServer could be added to AssignmentManager.servers again
[ https://issues.apache.org/jira/browse/HBASE-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-4033: Attachment: A_hbase-root-master-167-6-1-11.rar The attached logs related to the example 2 of the description. The shutdown RegionServer could be added to AssignmentManager.servers again --- Key: HBASE-4033 URL: https://issues.apache.org/jira/browse/HBASE-4033 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.4 Attachments: A_hbase-root-master-167-6-1-11.rar The folling steps can easily recreate the problem: 1. There's thousands of regions in the cluster. 2. Stop the cluster. 3. Start the cluster. Killing one regionserver while the regions were opening. Restarted it after 10 seconds. The shutted regionserver will appear in the AssignmentManager.servers list again. For example: Issue 1: 2011-06-23 14:14:30,775 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2220, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2205, 167-6-1-13,20020,1308803514394=2183 Two regionservers(One of it had aborted) had the same hostname but different startcode: 167-6-1-13,20020,1308803391742=2374 167-6-1-13,20020,1308803514394=2183 Issue 2: (1).The Rs 167-6-1-11,20020,1308105402003 finished shutdown at 10:46:37,774: 10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003 (2).Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions: 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) (3).Region was assigned to this dead RS again at 10:50:20,671: 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,0805861380030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4032) HBASE-451 improperly breaks public API HRegionInfo#getTableDesc
[ https://issues.apache.org/jira/browse/HBASE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054802#comment-13054802 ] stack commented on HBASE-4032: -- Let me fix this. Thanks for fingering it Andy HBASE-451 improperly breaks public API HRegionInfo#getTableDesc --- Key: HBASE-4032 URL: https://issues.apache.org/jira/browse/HBASE-4032 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: stack Priority: Blocker Fix For: 0.92.0 After HBASE-451, HRegionInfo#getTableDesc has been modified to always return {{null}}. One immediate effect is broken unit tests. That aside, it is not in the spirit of deprecation to actually break the method until after the deprecation cycle, it's a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4032) HBASE-451 improperly breaks public API HRegionInfo#getTableDesc
[ https://issues.apache.org/jira/browse/HBASE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reassigned HBASE-4032: Assignee: stack HBASE-451 improperly breaks public API HRegionInfo#getTableDesc --- Key: HBASE-4032 URL: https://issues.apache.org/jira/browse/HBASE-4032 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: stack Priority: Blocker Fix For: 0.92.0 After HBASE-451, HRegionInfo#getTableDesc has been modified to always return {{null}}. One immediate effect is broken unit tests. That aside, it is not in the spirit of deprecation to actually break the method until after the deprecation cycle, it's a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira