date:20110624


 [ 
https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao reassigned HBASE-4028:
-

Assignee: gaojinchao

 Hmaster crashes caused by splitting log.
 

 Key: HBASE-4028
 URL: https://issues.apache.org/jira/browse/HBASE-4028
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.4


 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G 
 when one region server crashed.
 I added some print in function doneWriting and found the values of 
 totalBuffered is negative.
 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: 
 gjc:release Used -565832
 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
 -565832release size25168
 void doneWriting(RegionEntryBuffer buffer) {
   synchronized (this) {
   LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey  + 
 buffer.encodedRegionName );
 boolean removed = currentlyWriting.remove(buffer.encodedRegionName);
 assert removed;
   }
   long size = buffer.heapSize();
   synchronized (dataAvailable) {
 totalBuffered -= size;
 LOG.warn(gjc:release Used  + totalBuffered );
 // We may unblock writers
 dataAvailable.notifyAll();
   }
   LOG.warn(gjc:release Used  + totalBuffered + release size+ size);
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.


 [ 
https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4028:
--

Attachment: Screenshot-2.png

 Hmaster crashes caused by splitting log.
 

 Key: HBASE-4028
 URL: https://issues.apache.org/jira/browse/HBASE-4028
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.4

 Attachments: Screenshot-2.png


 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G 
 when one region server crashed.
 I added some print in function doneWriting and found the values of 
 totalBuffered is negative.
 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: 
 gjc:release Used -565832
 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
 -565832release size25168
 void doneWriting(RegionEntryBuffer buffer) {
   synchronized (this) {
   LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey  + 
 buffer.encodedRegionName );
 boolean removed = currentlyWriting.remove(buffer.encodedRegionName);
 assert removed;
   }
   long size = buffer.heapSize();
   synchronized (dataAvailable) {
 totalBuffered -= size;
 LOG.warn(gjc:release Used  + totalBuffered );
 // We may unblock writers
 dataAvailable.notifyAll();
   }
   LOG.warn(gjc:release Used  + totalBuffered + release size+ size);
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.


 [ 
https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4028:
--

Attachment: hbase-root-master-157-5-100-8.rar

 Hmaster crashes caused by splitting log.
 

 Key: HBASE-4028
 URL: https://issues.apache.org/jira/browse/HBASE-4028
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.4

 Attachments: Screenshot-2.png, hbase-root-master-157-5-100-8.rar


 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G 
 when one region server crashed.
 I added some print in function doneWriting and found the values of 
 totalBuffered is negative.
 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: 
 gjc:release Used -565832
 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
 -565832release size25168
 void doneWriting(RegionEntryBuffer buffer) {
   synchronized (this) {
   LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey  + 
 buffer.encodedRegionName );
 boolean removed = currentlyWriting.remove(buffer.encodedRegionName);
 assert removed;
   }
   long size = buffer.heapSize();
   synchronized (dataAvailable) {
 totalBuffered -= size;
 LOG.warn(gjc:release Used  + totalBuffered );
 // We may unblock writers
 dataAvailable.notifyAll();
   }
   LOG.warn(gjc:release Used  + totalBuffered + release size+ size);
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4028) Hmaster crashes caused by splitting log.


 [ 
https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4028:
--

Attachment: HBASE-4028-0.90V1.patch

 Hmaster crashes caused by splitting log.
 

 Key: HBASE-4028
 URL: https://issues.apache.org/jira/browse/HBASE-4028
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.4

 Attachments: HBASE-4028-0.90V1.patch, Screenshot-2.png, 
 hbase-root-master-157-5-100-8.rar


 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G 
 when one region server crashed.
 I added some print in function doneWriting and found the values of 
 totalBuffered is negative.
 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: 
 gjc:release Used -565832
 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
 -565832release size25168
 void doneWriting(RegionEntryBuffer buffer) {
   synchronized (this) {
   LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey  + 
 buffer.encodedRegionName );
 boolean removed = currentlyWriting.remove(buffer.encodedRegionName);
 assert removed;
   }
   long size = buffer.heapSize();
   synchronized (dataAvailable) {
 totalBuffered -= size;
 LOG.warn(gjc:release Used  + totalBuffered );
 // We may unblock writers
 dataAvailable.notifyAll();
   }
   LOG.warn(gjc:release Used  + totalBuffered + release size+ size);
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4028) Hmaster crashes caused by splitting log.

2011-06-24 Thread mingjian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054385#comment-13054385
 ] 

mingjian commented on HBASE-4028:
-

gao, do you fix this problem after move totalBuffered += incrHeap; into 
synchronized (dataAvailable)?

 Hmaster crashes caused by splitting log.
 

 Key: HBASE-4028
 URL: https://issues.apache.org/jira/browse/HBASE-4028
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.4

 Attachments: HBASE-4028-0.90V1.patch, Screenshot-2.png, 
 hbase-root-master-157-5-100-8.rar


 In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G 
 when one region server crashed.
 I added some print in function doneWriting and found the values of 
 totalBuffered is negative.
 10:29:52,119 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: 
 gjc:release Used -565832
 hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
 -565832release size25168
 void doneWriting(RegionEntryBuffer buffer) {
   synchronized (this) {
   LOG.warn(gjc1: relase currentlyWriting +biggestBufferKey  + 
 buffer.encodedRegionName );
 boolean removed = currentlyWriting.remove(buffer.encodedRegionName);
 assert removed;
   }
   long size = buffer.heapSize();
   synchronized (dataAvailable) {
 totalBuffered -= size;
 LOG.warn(gjc:release Used  + totalBuffered );
 // We may unblock writers
 dataAvailable.notifyAll();
   }
   LOG.warn(gjc:release Used  + totalBuffered + release size+ size);
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4020:
-

Assignee: Vandana Ayyalasomayajula

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: TestHRegion.patch


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2011-06-24 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054478#comment-13054478
 ] 

jirapos...@reviews.apache.org commented on HBASE-3939:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/951/
---

(Updated 2011-06-24 14:42:47.298437)


Review request for hbase and Todd Lipcon.


Changes
---

Removed clientVersion check where clientVersion is less than 3.
Those clients would encounter the following exception connecting to zookeeper:

java.lang.IllegalArgumentException: Not a host:port pair: 
ciq.com,6,1308866059399


Summary
---

A few fixes from Hadoop IPC that we should probably cross-port into our copy:

* HADOOP-7227: remove the protocol version check at call time
* HADOOP-7146: fix a socket leak in server
* HADOOP-7121: fix behavior when response serialization throws an exception
* HADOOP-7346: send back nicer error response when client is using an out 
of date IPC version


This addresses bug HBASE-3939.
https://issues.apache.org/jira/browse/HBASE-3939


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
1137262 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
1137262 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.java 
1137262 
  /src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java 1137280 
  /src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 1139326 
  /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java 
1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/ProtocolSignature.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/ipc/Status.java PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1134732 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1134732 
  
/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java
 1137280 

Diff: https://reviews.apache.org/r/951/diff


Testing
---

Test suite passed.


Thanks,

Ted



 Some crossports of Hadoop IPC fixes
 ---

 Key: HBASE-3939
 URL: https://issues.apache.org/jira/browse/HBASE-3939
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 3939-v2.txt, 3939-v3.txt, 3939.txt


 A few fixes from Hadoop IPC that we should probably cross-port into our copy:
 - HADOOP-7227: remove the protocol version check at call time
 - HADOOP-7146: fix a socket leak in server
 - HADOOP-7121: fix behavior when response serialization throws an exception
 - HADOOP-7346: send back nicer error response when client is using an out of 
 date IPC version

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3229) Table creation, though using async call to master, can actually run for a while and cause RPC timeout


[ 
https://issues.apache.org/jira/browse/HBASE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054499#comment-13054499
 ] 

Ted Yu commented on HBASE-3229:
---

w.r.t. Kannan's comments.
In TRUNK, the following method of HMaster is async - see the third parameter:
{code}
  public void createTable(HTableDescriptor desc, byte [][] splitKeys)
  throws IOException {
createTable(desc, splitKeys, false);
  }
{code}
It is the only method exposed through HMasterInterface.

patch v5 from HBASE-3904 makes HBaseAdmin.createTable() to wait for all regions 
to be online.


 Table creation, though using async call to master, can actually run for a 
 while and cause RPC timeout
 ---

 Key: HBASE-3229
 URL: https://issues.apache.org/jira/browse/HBASE-3229
 Project: HBase
  Issue Type: Bug
  Components: client, master
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0


 Our create table methods in HBaseAdmin are synchronous from client POV.  
 However, underneath, we're using an async create and then looping waiting 
 for table availability.  Because the create is async and we loop instead of 
 block on RPC, we don't expect RPC timeouts.
 However, when creating a table with lots of initial regions, the async 
 create can actually take a long time (more than 30 seconds in this case) 
 which causes the client to timeout and gives impression something failed.
 We should make the create truly async so that this can't happen.  And rather 
 than doing one-off, inline assignment as it is today, we should reuse the 
 fancy enable/disable code stack just added to make this faster and more 
 optimal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4010) HMaster.createTable could be heavily optimized


 [ 
https://issues.apache.org/jira/browse/HBASE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4010:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

 HMaster.createTable could be heavily optimized
 --

 Key: HBASE-4010
 URL: https://issues.apache.org/jira/browse/HBASE-4010
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Jean-Daniel Cryans
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4010-0.90.txt, 4010-v2.txt, 4010-v3.txt, 4010-v5.txt


 Looking at the createTable method in HMaster (the one that's private), we 
 seem to be very inefficient:
  - We set the enabled flag for the table for every region (should be done 
 only once).
  - Every time we create a new region we create a new HLog and then close it 
 (reuse one instead or see if it's really necessary).
  - We do one RPC to .META. per region (we should batch put).
 This should provide drastic speedups even for those creating tables with just 
 50 regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4024:
-

Assignee: Ted Yu

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4029) Inappropriate checkin of Logging Mode in HRegionServer

2011-06-24 Thread Akash Ashok (JIRA)

Inappropriate checkin of Logging Mode in HRegionServer
--

 Key: HBASE-4029
 URL: https://issues.apache.org/jira/browse/HBASE-4029
 Project: HBase
  Issue Type: Bug
Reporter: Akash Ashok


There is a condition check for Debug mode logging in HRegionServer.java . 
Because of this the region server never closes the META region while stopping 
hbase and thus never stops, if DEBUG mode is not enable in logging. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Status: Patch Available  (was: Open)

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: 4024.txt

Added filesToCompact.size() check in isMajorCompaction()

TestCompaction and TestCompactSelection pass.

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer


 [ 
https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4029:
--

Fix Version/s: 0.92.0
  Summary: Inappropriate checking of Logging Mode in HRegionServer  
(was: Inappropriate checkin of Logging Mode in HRegionServer)

 Inappropriate checking of Logging Mode in HRegionServer
 ---

 Key: HBASE-4029
 URL: https://issues.apache.org/jira/browse/HBASE-4029
 Project: HBase
  Issue Type: Bug
Reporter: Akash Ashok
  Labels: regionserver
 Fix For: 0.92.0


 There is a condition check for Debug mode logging in HRegionServer.java . 
 Because of this the region server never closes the META region while stopping 
 hbase and thus never stops, if DEBUG mode is not enable in logging. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3891) TaskMonitor is used wrong in some places


[ 
https://issues.apache.org/jira/browse/HBASE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054543#comment-13054543
 ] 

Ted Yu commented on HBASE-3891:
---

HRegion.compact() keeps a reference to the proxy returned by 
TaskMonitor.get().createStatus()

If MonitoredTaskImpl@51bfa303 corresponds to this proxy, I don't know why 
weakProxy.get() returned null.

 TaskMonitor is used wrong in some places
 

 Key: HBASE-3891
 URL: https://issues.apache.org/jira/browse/HBASE-3891
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars George
 Fix For: 0.92.0


 I have a long running log replay in progress but none of the updates show. 
 This is caused by reusing the MonitorTask references wrong, and manifests 
 itself like this in the logs:
 {noformat}
 2011-05-16 15:22:18,127 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@51bfa303 appears 
 to have been leaked
 2011-05-16 15:22:18,128 DEBUG 
 org.apache.hadoop.hbase.monitoring.MonitoredTask: cleanup.
 {noformat}
 The cleanup sets the completion timestamp and causes the task to be purged 
 from the list. After that the UI for example does not show any further 
 running tasks, although from the logs I can see (with my log additions):
 {noformat}
 2011-05-16 15:29:52,296 DEBUG 
 org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Compaction 
 complete: 103.1m in 18542ms
 2011-05-16 15:29:52,296 DEBUG 
 org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Running 
 coprocessor post-compact hooks
 2011-05-16 15:29:52,296 DEBUG 
 org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Compaction 
 complete
 2011-05-16 15:29:52,297 DEBUG 
 org.apache.hadoop.hbase.monitoring.MonitoredTask: markComplete: Compaction 
 complete
 {noformat}
 They are silently ignored as the TaskMonitor has dropped their reference. We 
 need to figure out why a supposedly completed task monitor was reused.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException

2011-06-24 Thread Adam Phelps (JIRA)

LoadIncrementalHFiles fails with FileNotFoundException
--

 Key: HBASE-4030
 URL: https://issues.apache.org/jira/browse/HBASE-4030
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Adam Phelps


 Original Message 
Subject:Re: LoadIncrementalHFiles bug when regionserver fails to 
access file?
Date:   Thu, 23 Jun 2011 17:00:04 -0700
From:   Ted Yu yuzhih...@gmail.com
Reply-To:   cdh-u...@cloudera.org
To: u...@hbase.apache.org
CC: CDH Users cdh-u...@cloudera.org



This is due to the handling of HFile.Reader being wrapped in a 
try-finally block. However, there is no check as to whether the reader 
operation encounters any exception which should determine what to do next.

Please file a JIRA.

Thanks Adam.

On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com 
mailto:a...@opendns.com wrote:

(As a note, this is with CDH3u0 which is based on HBase 0.90.1)

We've been seeing intermittent failures of calls to
LoadIncrementalHFiles.  When this happens the node that made the
call will see a FileNotFoundException such as this:

2011-06-23 15:47:34.379566500 java.net
http://java.net.__SocketTimeoutException: Call to
s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on
socket timeout exception: java.net.SocketTi
meoutException: 6 millis timeout while waiting for channel to be
ready for read. ch : java.nio.channels.__SocketChannel[connected
local=/67.215.90.51:51605 http://67.215.90.51:51605 remo
te=s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020]
2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
java.io.FileNotFoundException: File does not exist:
/hfiles/2011/06/23/14/__domainsranked/TopDomainsRan
k.r3v5PRvK/handling/__3557032074765091256
2011-06-23 15:47:34.379573500   at

org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__openInfo(DFSClient.java:1602)
2011-06-23 15:47:34.379573500   at

org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__init(DFSClient.java:1593)

Over on the regionserver that was loading this we see that it
attempted to load and hit a 60 second timeout:

2011-06-23 15:45:54,634 INFO
org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at

hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
for inclusion in store handling region

domainsranked,368449:2011/0/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b
8.
...
2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.__DFSClient:
Failed to connect to /67.215.90.38:50010
http://67.215.90.38:50010, add to deadNodes and continue
java.net http://java.net.__SocketTimeoutException: 6 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.__SocketChannel[connected
local=/67.215.90.38:42199 http://67.215.90.38:42199
remote=/67.215.90.38:50010 http://67.215.90.38:50010]
at org.apache.hadoop.net

http://org.apache.hadoop.net.__SocketIOWithTimeout.doIO(__SocketIOWithTimeout.java:164)
at org.apache.hadoop.net

http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:155)
at org.apache.hadoop.net

http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:128)
at java.io.BufferedInputStream.__fill(BufferedInputStream.java:__218)
at java.io.BufferedInputStream.__read(BufferedInputStream.java:__237)
at java.io.DataInputStream.__readShort(DataInputStream.__java:295)

We suspect this particular problem is a resource contention issue on
our side.  However, the loading process proceeds to rename the file
despite the failure:

2011-06-23 15:46:54,657 INFO
org.apache.hadoop.hbase.__regionserver.Store: Renaming bulk load
file

hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
to

hdfs://namenode.XXX:8020/__hbase/domainsranked/__d4925aca7852bed32613a509215d42__b8/handling/__3615917062821145533

And then the LoadIncrementalHFiles tries to load the hfile again:

2011-06-23 15:46:55,684 INFO
org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at

hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
for inclusion in store handling region

domainsranked,368449:2011/05/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b8.

2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.__HBaseServer:
IPC Server handler 147 on 60020, call

[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners


[ 
https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054544#comment-13054544
 ] 

Jean-Daniel Cryans commented on HBASE-3852:
---

I'm not sure it fixes the problem, since the issue here is about users not 
getting back to close even tho there's still more data. We patched in something 
at SU that we've been running for more than a month, but it is specific to our 
usecase. Anyways, feel free to inspire yourself with those two commits:

https://github.com/stumbleupon/hbase/commit/7bcc63ee22f7e0218fb7225f387ffc0bd2279f59#src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
https://github.com/stumbleupon/hbase/commit/3d08cc8ad5abfec11ee0340d5a75a2b308854e17#src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java

 ThriftServer leaks scanners
 ---

 Key: HBASE-3852
 URL: https://issues.apache.org/jira/browse/HBASE-3852
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3852.txt


 The scannerMap in ThriftServer relies on the user to clean it by closing the 
 scanner. If that doesn't happen, the ResultScanner will stay in the thrift 
 server's memory and if any pre-fetching was done, it will also start 
 accumulating Results (with all their data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer


[ 
https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054553#comment-13054553
 ] 

Jean-Daniel Cryans commented on HBASE-4029:
---

Would you mind contributing a fix?

 Inappropriate checking of Logging Mode in HRegionServer
 ---

 Key: HBASE-4029
 URL: https://issues.apache.org/jira/browse/HBASE-4029
 Project: HBase
  Issue Type: Bug
Reporter: Akash Ashok
  Labels: regionserver
 Fix For: 0.92.0


 There is a condition check for Debug mode logging in HRegionServer.java . 
 Because of this the region server never closes the META region while stopping 
 hbase and thus never stops, if DEBUG mode is not enable in logging. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer

2011-06-24 Thread Akash Ashok (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Ashok reassigned HBASE-4029:
--

Assignee: Akash Ashok

 Inappropriate checking of Logging Mode in HRegionServer
 ---

 Key: HBASE-4029
 URL: https://issues.apache.org/jira/browse/HBASE-4029
 Project: HBase
  Issue Type: Bug
Reporter: Akash Ashok
Assignee: Akash Ashok
  Labels: regionserver
 Fix For: 0.92.0


 There is a condition check for Debug mode logging in HRegionServer.java . 
 Because of this the region server never closes the META region while stopping 
 hbase and thus never stops, if DEBUG mode is not enable in logging. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4029) Inappropriate checking of Logging Mode in HRegionServer

2011-06-24 Thread Akash Ashok (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054557#comment-13054557
 ] 

Akash Ashok commented on HBASE-4029:


Sure. I am workin on it. 

 Inappropriate checking of Logging Mode in HRegionServer
 ---

 Key: HBASE-4029
 URL: https://issues.apache.org/jira/browse/HBASE-4029
 Project: HBase
  Issue Type: Bug
Reporter: Akash Ashok
Assignee: Akash Ashok
  Labels: regionserver
 Fix For: 0.92.0


 There is a condition check for Debug mode logging in HRegionServer.java . 
 Because of this the region server never closes the META region while stopping 
 hbase and thus never stops, if DEBUG mode is not enable in logging. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


[ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054558#comment-13054558
 ] 

Jean-Daniel Cryans commented on HBASE-4024:
---

Regarding the last edit in the patch, I don't think you should remove that 
whitespace and what would be the reason to keep the check on the number of 
files there if it's moved into isMajorCompaction?

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


[ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054562#comment-13054562
 ] 

Ted Yu commented on HBASE-4024:
---

I can revert the whitespace change.
The check on the number of files was kept for the case when forcemajor is true.

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners


[ 
https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054565#comment-13054565
 ] 

Ted Yu commented on HBASE-3852:
---

I was thinking about timeout-driven expiration policy.
SU's implementation looks nice. We can introduce a new parameter (sigh) to 
control whether ScannerCleaner is instantiated at startup.

@J-D: do you happen to know how many scanners were closed by ScannerCleaner in 
the past month ?

 ThriftServer leaks scanners
 ---

 Key: HBASE-3852
 URL: https://issues.apache.org/jira/browse/HBASE-3852
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3852.txt


 The scannerMap in ThriftServer relies on the user to clean it by closing the 
 scanner. If that doesn't happen, the ResultScanner will stay in the thrift 
 server's memory and if any pre-fetching was done, it will also start 
 accumulating Results (with all their data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-4020:


Attachment: HBASE-4020.txt

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: HBASE-4020.txt, TestHRegion.patch


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: (was: 4024.txt)

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: 4024.txt

Removed the whitespace adjustment.

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: (was: 4024.txt)

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: 4024.txt

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descript

2011-06-24 Thread Subbu M Iyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subbu M Iyer updated HBASE-4025:


Attachment: 
HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException


 [ 
https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4030:
--

Fix Version/s: 0.90.4

 LoadIncrementalHFiles fails with FileNotFoundException
 --

 Key: HBASE-4030
 URL: https://issues.apache.org/jira/browse/HBASE-4030
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Adam Phelps
 Fix For: 0.90.4


  Original Message 
 Subject:  Re: LoadIncrementalHFiles bug when regionserver fails to 
 access file?
 Date: Thu, 23 Jun 2011 17:00:04 -0700
 From: Ted Yu yuzhih...@gmail.com
 Reply-To: cdh-u...@cloudera.org
 To:   u...@hbase.apache.org
 CC:   CDH Users cdh-u...@cloudera.org
 This is due to the handling of HFile.Reader being wrapped in a 
 try-finally block. However, there is no check as to whether the reader 
 operation encounters any exception which should determine what to do next.
 Please file a JIRA.
 Thanks Adam.
 On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com 
 mailto:a...@opendns.com wrote:
 (As a note, this is with CDH3u0 which is based on HBase 0.90.1)
 We've been seeing intermittent failures of calls to
 LoadIncrementalHFiles.  When this happens the node that made the
 call will see a FileNotFoundException such as this:
 2011-06-23 15:47:34.379566500 java.net
 http://java.net.__SocketTimeoutException: Call to
 s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on
 socket timeout exception: java.net.SocketTi
 meoutException: 6 millis timeout while waiting for channel to be
 ready for read. ch : java.nio.channels.__SocketChannel[connected
 local=/67.215.90.51:51605 http://67.215.90.51:51605 remo
 te=s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020]
 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
 java.io.FileNotFoundException: File does not exist:
 /hfiles/2011/06/23/14/__domainsranked/TopDomainsRan
 k.r3v5PRvK/handling/__3557032074765091256
 2011-06-23 15:47:34.379573500   at
 
 org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__openInfo(DFSClient.java:1602)
 2011-06-23 15:47:34.379573500   at
 
 org.apache.hadoop.hdfs.__DFSClient$DFSInputStream.__init(DFSClient.java:1593)
 Over on the regionserver that was loading this we see that it
 attempted to load and hit a 60 second timeout:
 2011-06-23 15:45:54,634 INFO
 org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at
 
 hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
 for inclusion in store handling region
 
 domainsranked,368449:2011/0/__03/23:category::com.__zynga.static.fishville.__facebook,1305890318961.__d4925aca7852bed32613a509215d42__b
 8.
 ...
 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.__DFSClient:
 Failed to connect to /67.215.90.38:50010
 http://67.215.90.38:50010, add to deadNodes and continue
 java.net http://java.net.__SocketTimeoutException: 6 millis
 timeout while waiting for channel to be ready for read. ch :
 java.nio.channels.__SocketChannel[connected
 local=/67.215.90.38:42199 http://67.215.90.38:42199
 remote=/67.215.90.38:50010 http://67.215.90.38:50010]
 at org.apache.hadoop.net
 
 http://org.apache.hadoop.net.__SocketIOWithTimeout.doIO(__SocketIOWithTimeout.java:164)
 at org.apache.hadoop.net
 
 http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:155)
 at org.apache.hadoop.net
 
 http://org.apache.hadoop.net.__SocketInputStream.read(__SocketInputStream.java:128)
 at java.io.BufferedInputStream.__fill(BufferedInputStream.java:__218)
 at java.io.BufferedInputStream.__read(BufferedInputStream.java:__237)
 at java.io.DataInputStream.__readShort(DataInputStream.__java:295)
 We suspect this particular problem is a resource contention issue on
 our side.  However, the loading process proceeds to rename the file
 despite the failure:
 2011-06-23 15:46:54,657 INFO
 org.apache.hadoop.hbase.__regionserver.Store: Renaming bulk load
 file
 
 hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
 to
 
 hdfs://namenode.XXX:8020/__hbase/domainsranked/__d4925aca7852bed32613a509215d42__b8/handling/__3615917062821145533
 And then the LoadIncrementalHFiles tries to load the hfile again:
 2011-06-23 15:46:55,684 INFO
 org.apache.hadoop.hbase.__regionserver.Store: Validating hfile at
 
 hdfs://namenode.XXX/hfiles/__2011/06/23/14/domainsranked/__TopDomainsRank.r3v5PRvK/__handling/3557032074765091256
 for inclusion in store handling region

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri

2011-06-24 Thread Subbu M Iyer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054590#comment-13054590
 ] 

Subbu M Iyer commented on HBASE-4025:
-

May be we should create all tables under /hbase/tables/table name instead of 
/hbase/table name so that we can avoid future cases where we create other 
folder under /hbase such as .logs,.corrupt et al that does not contain table 
descriptors?

So, this pattern will have some thing like:

/hbase/.logs
/hbase/.corrupt
/hbase/.oldlogs
/hbase/.META.
/hbase/-ROOT-
/hbase/future non user system folders
/hbase/UserTables/user table folder/.tableinfo

and when we need to retrieve all the table descriptors we simply iterate over 
the /hbase/UserTables folder rather than the /hbase and ignore all system 
folders.

Other option would be:
/hbase/System/.logs, .oldlogs, .corrupt et al.
/hbase/UserTables/user tables

This way we can avoid adding a band-aid fix to this read table descriptor logic 
every time we have a new system folder.

thoughts?



 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-4020:


Attachment: HBASE-4020.txt

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: HBASE-4020.txt


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-4020:


Attachment: (was: TestHRegion.patch)

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: HBASE-4020.txt


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-4020:


Attachment: (was: HBASE-4020.txt)

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: HBASE-4020.txt


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054599#comment-13054599
 ] 

Ted Yu commented on HBASE-4025:
---

A third option :-)
/hbase/System/.logs, .oldlogs, .corrupt et al.
/hbase/user tables

Users/developers are used to the current hdfs structure. This would introduce 
relatively small impact to existing user tables.

Let's see what other developers think.

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException

2011-06-24 Thread Adam Phelps (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Phelps updated HBASE-4030:
---

Description: 
-- We've been seeing intermittent failures of calls to LoadIncrementalHFiles.  
When this happens the node that made the call will see a FileNotFoundException 
such as this:

2011-06-23 15:47:34.379566500 java.net.SocketTimeoutException: Call to 
s8.XXX/67.215.90.38:60020 failed on socket timeout exception: java.net.SocketTi
meoutException: 6 millis timeout while waiting for channel to be ready for 
read. ch : java.nio.channels.SocketChannel[connected local=/67.215.90.51:51605 
remo
te=s8.XXX/67.215.90.38:60020]
2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: 
java.io.FileNotFoundException: File does not exist: 
/hfiles/2011/06/23/14/domainsranked/TopDomainsRan
k.r3v5PRvK/handling/3557032074765091256
2011-06-23 15:47:34.379573500   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602)
2011-06-23 15:47:34.379573500   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1593)

-- Over on the regionserver that was loading this we see that it attempted to 
load and hit a 60 second timeout:

2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.regionserver.Store: 
Validating hfile at 
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
 for inclusion in store handling region 
domainsranked,368449:2011/0/03/23:category::com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b
8.
...
2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.DFSClient: Failed to 
connect to /67.215.90.38:50010, add to deadNodes and continue
java.net.SocketTimeoutException: 6 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/67.215.90.38:42199 remote=/67.215.90.38:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readShort(DataInputStream.java:295)

-- We suspect this particular problem is a resource contention issue on our 
side.  However, the loading process proceeds to rename the file despite the 
failure:

2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.regionserver.Store: 
Renaming bulk load file 
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
 to 
hdfs://namenode.XXX:8020/hbase/domainsranked/d4925aca7852bed32613a509215d42b8/handling/3615917062821145533

-- And then the LoadIncrementalHFiles tries to load the hfile again:

2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.regionserver.Store: 
Validating hfile at 
hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
 for inclusion in store handling region 
domainsranked,368449:2011/05/03/23:category::com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b8.

2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 147 on 60020, call 
bulkLoadHFile(hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256,
 [B@4224508b, [B@5e23f799) from 67.215.90.51:51856: error: 
java.io.FileNotFoundException: File does not exist: 
/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256

-- This eventually leads to the load command failing.

  was:
 Original Message 
Subject:Re: LoadIncrementalHFiles bug when regionserver fails to 
access file?
Date:   Thu, 23 Jun 2011 17:00:04 -0700
From:   Ted Yu yuzhih...@gmail.com
Reply-To:   cdh-u...@cloudera.org
To: u...@hbase.apache.org
CC: CDH Users cdh-u...@cloudera.org



This is due to the handling of HFile.Reader being wrapped in a 
try-finally block. However, there is no check as to whether the reader 
operation encounters any exception which should determine what to do next.

Please file a JIRA.

Thanks Adam.

On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps a...@opendns.com 
mailto:a...@opendns.com wrote:

(As a note, this is with CDH3u0 which is based on HBase 0.90.1)

We've been seeing intermittent failures of calls to
LoadIncrementalHFiles.  When this happens the node that made the
call will see a FileNotFoundException such as this:

2011-06-23 15:47:34.379566500 java.net
http://java.net.__SocketTimeoutException: Call to
s8.XXX/67.215.90.38:60020 http://67.215.90.38:60020 failed on
socket timeout exception: java.net.SocketTi
meoutException: 6

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054610#comment-13054610
 ] 

Ted Yu commented on HBASE-4025:
---

Overall +1
I am running test suite.

Minor comment, maybe there is a better place for hbaseNonTableDirs but I don't 
have strong opinion.
In HConstants, all constants are in upper case. How about renaming it to 
HBASE_NON_USER_TABLE_DIRS ?

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descrip


 [ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4025:
-

Assignee: Subbu M Iyer

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3852) ThriftServer leaks scanners


[ 
https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054616#comment-13054616
 ] 

Jean-Daniel Cryans commented on HBASE-3852:
---

bq. do you happen to know how many scanners were closed by ScannerCleaner in 
the past month ?

I used to print that out but it was really spammy. Probably tens of thousands.

 ThriftServer leaks scanners
 ---

 Key: HBASE-3852
 URL: https://issues.apache.org/jira/browse/HBASE-3852
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3852.txt


 The scannerMap in ThriftServer relies on the user to clean it by closing the 
 scanner. If that doesn't happen, the ResultScanner will stay in the thrift 
 server's memory and if any pre-fetching was done, it will also start 
 accumulating Results (with all their data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


[ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054618#comment-13054618
 ] 

Jean-Daniel Cryans commented on HBASE-4024:
---

I'm starting to think that the check should be refactored out of the ifs and 
put right at the beginning, and then maybe print a nice message on why it's 
skipping?

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4024) Major compaction may not be triggered, even though region server log says it is triggered


 [ 
https://issues.apache.org/jira/browse/HBASE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4024:
--

Attachment: 4024-v2.txt

Allow me to reformat a portion of existing code in v2.

 Major compaction may not be triggered, even though region server log says it 
 is triggered
 -

 Key: HBASE-4024
 URL: https://issues.apache.org/jira/browse/HBASE-4024
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Suraj Varma
Assignee: Ted Yu
Priority: Trivial
  Labels: newbie
 Fix For: 0.92.0

 Attachments: 4024-v2.txt, 4024.txt


 The trunk version of regionserver/Store.java, method   ListStoreFile 
 compactSelection(ListStoreFile candidates) has this code to determine 
 whether major compaction should be done or not: 
 // major compact on user action or age (caveat: we have too many files)
 boolean majorcompaction = (forcemajor || 
 isMajorCompaction(filesToCompact))
filesToCompact.size()  this.maxFilesToCompact;
 The isMajorCompaction(filesToCompact) method internally determines whether or 
 not major compaction is required (and logs this as Major compaction 
 triggered ...  log message. However, after the call, the compactSelection 
 method subsequently applies the filesToCompact.size()  
 this.maxFilesToCompact check which can turn off major compaction. 
 This would result in a Major compaction triggered log message without 
 actually triggering a major compaction.
 The filesToCompact.size() check should probably be moved inside the 
 isMajorCompaction(filesToCompact) method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-451) Remove HTableDescriptor from HRegionInfo

2011-06-24 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-451:
--


TestFSTableDescriptors.testHTableDescriptors has been broken since revision 
1138120 (HBASE-451 Remove HTableDescriptor from HRegionInfo -- part 2, some 
cleanup)

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Jonathan Gray (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054639#comment-13054639
 ] 

Jonathan Gray commented on HBASE-4027:
--

In the new HFile v2 over in HBASE-3857 the block cache interface changes from 
ByteBuffer to HeapSize.  So you can now put anything you want into the cache 
that implements HeapSize (there is a new HFileBlock that is used in HFile v2).

One big question is whether you're going to make copies out of the direct byte 
buffers on each read of that block, or if you're going to change KeyValue to 
use the ByteBuffer interface (or some other) instead of the byte[] directly.  
With a DBB you can't get access to an underlying byte[].

 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

 Java offers the creation of direct byte buffers which are allocated outside 
 of the heap.
 They need to be manually free'd, which can be accomplished using an 
 documented {{clean}} method.
 The feature will be optional.  After implementing, we can benchmark for 
 differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Li Pi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054660#comment-13054660
 ] 

Li Pi commented on HBASE-4027:
--

This would be really useful. I think even making copies out of the direct byte 
buffers would confer a substantial performance advantage over the FS cache.

 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

 Java offers the creation of direct byte buffers which are allocated outside 
 of the heap.
 They need to be manually free'd, which can be accomplished using an 
 documented {{clean}} method.
 The feature will be optional.  After implementing, we can benchmark for 
 differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri

2011-06-24 Thread Subbu M Iyer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054685#comment-13054685
 ] 

Subbu M Iyer commented on HBASE-4025:
-

Agree regarding the naming of variable to HBASE_NON_USER_TABLE_DIRS and as far 
as where it should go, I don't have a strong opinion either.

As far as other issue is concerned, we can go either way as long as we have 
unique way of identifying all the user tables.

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690
]

Ted Yu commented on HBASE-451:
--

Looks like TestTableMapReduce doesn't create the table. From MultiRegionTable
which TestTableMapReduce inherits:
{code}
HRegion region = createNewHRegion(desc, startKey, endKey);
{code}
I added test of whether table descriptor exists on HDFS in
HRegion.createHRegion(). If it doesn't exist, I call
FSUtils.createTableDescriptor().

Now TestTableMapReduce passes.

Remove HTableDescriptor from HRegionInfo

Key: HBASE-451
URL: https://issues.apache.org/jira/browse/HBASE-451
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
Fix For: 0.92.0

Attachments: 451_support_for_removing_HTD_from_HRI_trunk.txt,
HBASE-451-Fixed_broken_TestAdmin.patch,
HBASE-451-Fixed_broken_TestAdmin1.patch,
HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch,
HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch,
HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt,
fixtestadmin.txt, pass_htd_on_region_construction.txt

There is an HRegionInfo for every region in HBase. Currently HRegionInfo also
contains the HTableDescriptor (the schema). That means we store the schema n
times where n is the number of regions in the table.
Additionally, for every region of the same table that the region server has
open, there is a copy of the schema. Thus it is stored in memory once for
each open region.
If HRegionInfo merely contained the table name the HTableDescriptor could be
stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-451) Remove HTableDescriptor from HRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690
]

Ted Yu edited comment on HBASE-451 at 6/24/11 9:36 PM:
---

Now TestTableMapReduce and TestFSTableDescriptors both pass.

was (Author: yuzhih...@gmail.com):
Looks like TestTableMapReduce doesn't create the table. From
MultiRegionTable which TestTableMapReduce inherits:
{code}
HRegion region = createNewHRegion(desc, startKey, endKey);
{code}
I added test of whether table descriptor exists on HDFS in
HRegion.createHRegion(). If it doesn't exist, I call
FSUtils.createTableDescriptor().

Now TestTableMapReduce passes.

Remove HTableDescriptor from HRegionInfo

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-451:
-

Attachment: 451-addendum.txt

Here is my addendum.
There could be a cleaner way of detecting that table descriptor doesn't exist
on HDFS. For the moment, I rely on TableExistsException.

Remove HTableDescriptor from HRegionInfo

Attachments: 451-addendum.txt,
451_support_for_removing_HTD_from_HRI_trunk.txt,
HBASE-451-Fixed_broken_TestAdmin.patch,
HBASE-451-Fixed_broken_TestAdmin1.patch,
HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch,
HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch,
HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt,
fixtestadmin.txt, pass_htd_on_region_construction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-451) Remove HTableDescriptor from HRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054690#comment-13054690
]

Ted Yu edited comment on HBASE-451 at 6/24/11 9:46 PM:
---

Now TestTableMapReduce and TestFSTableDescriptors both pass.
TestDistributedLogSplitting and TestSplitTransactionOnCluster pass on my laptop
as well.

Now TestTableMapReduce and TestFSTableDescriptors both pass.

Remove HTableDescriptor from HRegionInfo

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo


[ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054712#comment-13054712
 ] 

Ted Yu commented on HBASE-451:
--

TestDistributedLogSplitting hung on Linux.

The second time I ran it on my laptop, I got:
{code}
Failed tests: 
  testThreeRSAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
{code}

The first exception in output file was:
{code}
2011-06-24 15:01:37,115 WARN  [PostOpenDeployTasks:1028785192] 
handler.OpenRegionHandler$PostOpenDeployTasksThread(221): Exception running 
postOpenDeployTasks; region=1028785192
java.io.IOException: No server for -ROOT-
at 
org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaLocation(MetaEditor.java:149)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1405)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:218)
{code}
Although there is no such exception in output on Linux.

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054714#comment-13054714
]

Gary Helmling commented on HBASE-451:
-

@Ted, good digging.

I'm not very familiar with these changes, but it looks to me like the changes
so far have tried to pull HTableDescription handling out of HRegion. So adding
it back in to HRegion.createHRegion() may be a step back.

I think I'd opt for trying to fix the tests to call
FSUtils.createTableDescriptor() instead. Either in TestTableMapReduce.init
or MultiRegionTable.preHBaseClusterSetup(), prior to creating the table
regions. I think either of those would work. I wonder how many other
HBaseTestCase subclasses may have problems as well.

Remove HTableDescriptor from HRegionInfo

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


 [ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4020:
--

Attachment: 4020-ted.txt

I would apply this patch.

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: 4020-ted.txt, HBASE-4020.txt


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo


[ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054722#comment-13054722
 ] 

Ted Yu commented on HBASE-451:
--

Thanks Gary for the reminder.
I will upload addendum version 2 which creates table descriptor in 
MultiRegionTable.preHBaseClusterSetup().

TestTableMapReduce passes:
{code}
Running org.apache.hadoop.hbase.mapred.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.72 sec
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 83.983 sec
{code}

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo


 [ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-451:
-

Attachment: (was: 451-addendum.txt)

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-451) Remove HTableDescriptor from HRegionInfo


 [ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-451:
-

Attachment: 451-addendum-v2.txt

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo


[ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054729#comment-13054729
 ] 

Gary Helmling commented on HBASE-451:
-

@Ted,

+1 from me on addendum v2, since the tests now pass.

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054737#comment-13054737
 ] 

Jason Rutherglen commented on HBASE-4027:
-

{quote}One big question is whether you're going to make copies out of the 
direct byte buffers on each read of that block, or if you're going to change 
KeyValue to use the ByteBuffer interface (or some other) instead of the byte[] 
directly{quote}

Right the {{HFile.Scanner.getKeyValue()}} method is calling {{block.array()}}.  
We'd need to track down all {{byte[]}} references, and convert them to 
{{ByteBuffer}}.  That's more of a separate Jira.  

I think converting a direct ByteBuffer to byte[] will generate a fair amount of 
garbage, though of a different (smaller and more numerous) kind than the blocks.

 Enable direct byte buffers LruBlockCache
 

 Key: HBASE-4027
 URL: https://issues.apache.org/jira/browse/HBASE-4027
 Project: HBase
  Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

 Java offers the creation of direct byte buffers which are allocated outside 
 of the heap.
 They need to be manually free'd, which can be accomplished using an 
 documented {{clean}} method.
 The feature will be optional.  After implementing, we can benchmark for 
 differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054740#comment-13054740
 ] 

Gary Helmling commented on HBASE-4025:
--

Since we already have rules for valid user table names, why not just apply 
those in the directory listing?

User tables are not allowed to start with '.' or '-', so ignore directory 
entries beginning with those.  Special case '.META.' and '-ROOT-', since that's 
what we do most places for those 2 tables anyway.

We already generally are following a convention of system directories 
starting with '.', so this seems sufficient to me.  No need to move anything 
around.

 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4025) Server startup fails during startup due to failure in loading all table descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders while reading descri


[ 
https://issues.apache.org/jira/browse/HBASE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054742#comment-13054742
 ] 

Gary Helmling commented on HBASE-4025:
--

(repeating above but with formatting fixed)...

Since we already have rules for valid user table names, why not just apply 
those in the directory listing?

User tables are not allowed to start with '.' or '\-', so ignore directory 
entries beginning with those. Special case '.META.' and '\-ROOT\-', since 
that's what we do most places for those 2 tables anyway.

We already generally are following a convention of system directories 
starting with '.', so this seems sufficient to me. No need to move anything 
around.


 Server startup fails during startup due to failure in loading all table 
 descriptors. We should ignore .logs,.oldlogs,.corrupt,.META.,-ROOT- folders 
 while reading descriptors 
 --

 Key: HBASE-4025
 URL: https://issues.apache.org/jira/browse/HBASE-4025
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Attachments: 
 HBASE-4025_-_Server_startup_fails_while_reading_table_descriptor_from__corrupt_folder_1.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 2011-06-23 21:39:52,524 WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: 
 Status org.apache.hadoop.hbase.monitoring.MonitoredTaskImpl@2f56f920 appears 
 to have been leaked
 2011-06-23 21:40:06,465 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)
 2011-06-23 21:40:26,790 WARN org.apache.hadoop.hbase.master.HMaster: Failed 
 getting all descriptors
 java.io.FileNotFoundException: No status for 
 hdfs://ciq.com:9000/hbase/.corrupt
   at 
 org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
   at 
 org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
   at 
 org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1442)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1138)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache

2011-06-24 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054747#comment-13054747
]

Jason Rutherglen commented on HBASE-4027:
-

{quote}This would be really useful. I think even making copies out of the
direct byte buffers would confer a substantial performance advantage over the
FS cache.{quote}

The filesystem cache doesn't help because HBase needs quick access to
uncompressed blocks for scanning. For what duration does an uncompressed block
need to be cached? In either case, accessing compressed blocks from the FS
cache will be faster than hitting the disk or network. I am guessing one can
maintain a small'ish block cache, ensure HDFS blocks are local, provide extra
space for the FS cache, and gain in read throughput.

Snappy should decompress fast enough for this to be more viable than
maintaining a large-ish block cache. The problem [today] with a small'ish
block cache is the GC is driven mad.

Enable direct byte buffers LruBlockCache

Key: HBASE-4027
URL: https://issues.apache.org/jira/browse/HBASE-4027
Project: HBase
Issue Type: Improvement
Reporter: Jason Rutherglen
Priority: Minor

Java offers the creation of direct byte buffers which are allocated outside
of the heap.
They need to be manually free'd, which can be accomplished using an
documented {{clean}} method.
The feature will be optional. After implementing, we can benchmark for
differences in speed and garbage collection observances.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo


[ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054756#comment-13054756
 ] 

Ted Yu commented on HBASE-451:
--

On Linux, TestFSTableDescriptors fails at the following assertion:
{code}
assertEquals(count * 2, htds.cachehits);
{code}
The error was:
{code}
java.lang.AssertionError: expected:20 but was:30
{code}

Stack should know how to fix the above.

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4020) testWritesWhileGetting unit test needs to be fixed.


[ 
https://issues.apache.org/jira/browse/HBASE-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054759#comment-13054759
 ] 

Ted Yu commented on HBASE-4020:
---

Integrated to TRUNK.

Thanks for the patch Vandana.

 testWritesWhileGetting unit test needs to be fixed. 
 --

 Key: HBASE-4020
 URL: https://issues.apache.org/jira/browse/HBASE-4020
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.3
 Environment: OS: RHEL 5.4
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.92.0

 Attachments: 4020-ted.txt, HBASE-4020.txt


 The unit test testWritesWhileGetting in the 
 org.apache.hadoop.hbase.regionserver.TestHRegion test needs to be corrected. 
 It is current using the table name and method name for initializing a HRegion 
 as testWritesWhileScanning. It should be testWritesWhileGetting. 
 Due to this, the test fails as the initHRegion method fails in creating a 
 new HRegion for the test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3810) Registering a Coprocessor at HTableDescriptor should be less strict

2011-06-24 Thread Mingjie Lai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingjie Lai reassigned HBASE-3810:
--

Assignee: Mingjie Lai

 Registering a Coprocessor at HTableDescriptor should be less strict
 ---

 Key: HBASE-3810
 URL: https://issues.apache.org/jira/browse/HBASE-3810
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: Joerg Schad
Assignee: Mingjie Lai
Priority: Minor
   Original Estimate: 2h
  Remaining Estimate: 2h

 Registering a Copressor in the following way will fail as the Coprocessor$1 
 keyword is case sensitive (instead COPROCESSOR$1 works fine). Removing this 
 restriction would improve usability.
 HTableDescriptor desc = new HTableDescriptor(tName);
 desc.setValue(Coprocessor$1,
path.toString() + : + full_class_name +
  : + Coprocessor.Priority.USER);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo


[ 
https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054772#comment-13054772
 ] 

Ted Yu commented on HBASE-451:
--

I committed addendum v2 into TRUNK.

 Remove HTableDescriptor from HRegionInfo
 

 Key: HBASE-451
 URL: https://issues.apache.org/jira/browse/HBASE-451
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.2.0
Reporter: Jim Kellerman
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.92.0

 Attachments: 451-addendum-v2.txt, 
 451_support_for_removing_HTD_from_HRI_trunk.txt, 
 HBASE-451-Fixed_broken_TestAdmin.patch, 
 HBASE-451-Fixed_broken_TestAdmin1.patch, 
 HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, 
 HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, 
 HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, 
 fixtestadmin.txt, pass_htd_on_region_construction.txt


 There is an HRegionInfo for every region in HBase. Currently HRegionInfo also 
 contains the HTableDescriptor (the schema). That means we store the schema n 
 times where n is the number of regions in the table.
 Additionally, for every region of the same table that the region server has 
 open, there is a copy of the schema. Thus it is stored in memory once for 
 each open region.
 If HRegionInfo merely contained the table name the HTableDescriptor could be 
 stored in a separate file and easily found.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4031) An imbalance result calculated by LoadBalancer

An imbalance result calculated by LoadBalancer
--

 Key: HBASE-4031
 URL: https://issues.apache.org/jira/browse/HBASE-4031
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4


  I found the problem while the cluster couldn't balance(Around time of 
2011-05-24 11:28).One node's regions count is the double of the other nodes. 
And it didn't move regions anymore:
   Address Start Code Load
158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, 
maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, 
usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, 
regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 
requests=0, regions=2633, usedHeap=69, maxHeap=8165 
Total:  servers: 4   requests=0, regions=13689


  HBASE-3985-Same Region could be picked out twice in LoadBalancer was found 
by my analysis on this problem.
  But I'm afraid it's not the main cause of the problem.

  There's one active master, one standby master, four regionservers in our 
cluster.

10:57:41, the standby hamster 222 becomes the active one.
2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master 
startup proceeding: master failover

4 regionservers was registered in 222 one by one. Only one regionserver 
seemed some time late.
2011-05-24 10:57:37,533 INFO : Registering 
server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true
2011-05-24 10:57:37,537 INFO : Registering 
server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true
2011-05-24 10:57:37,598 INFO : Registering 
server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true
2011-05-24 10:59:00,408 INFO : Registering 
server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false

13134 regions needed to move after rebuildUserRegions(13689 regions in the 
cluster during the time).
2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Failed-over master needs to process 13134 regions in transition

All the 13134 regions were opened, regions opened count in each server:
158-1-101-222,20020,1306205940117Count: 834
158-1-101-82,20020,1306205415714Count: 4093
158-1-101-202,20020,1306205409671Count: 4118
158-1-101-52,20020,1306205417261Count: 4089

The nearest balancer calculate results:
2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded 
servers onto 1 less loaded servers

5012 is an unimaginable number here, for it is larger than the average number 
3424.5


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3516) Coprocessors: add test cases for loading coprocessor jars from hdfs and local fs.

2011-06-24 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054795#comment-13054795
 ] 

jirapos...@reviews.apache.org commented on HBASE-3516:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/963/
---

Review request for hbase.


Summary
---

There is no test case for testing coprocesors class loading from hdfs or local 
file system. 

Add test cases for cp class loading. 

It does:
- compile a cp implementation on the fly by Java compiler api
- build a jar file from the compiled classes
- copy the jar to local file system or hdfs so it can be loaded for a region 


This addresses bug HBase-3516.
https://issues.apache.org/jira/browse/HBase-3516


Diffs
-

  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
0a1fb2a 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/963/diff


Testing
---


Thanks,

Mingjie



 Coprocessors: add test cases for loading coprocessor jars from hdfs and local 
 fs. 
 --

 Key: HBASE-3516
 URL: https://issues.apache.org/jira/browse/HBASE-3516
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.90.0
Reporter: Mingjie Lai
Assignee: Mingjie Lai

 Loading coprocessors classes from jar files (at hdfs or local fs) is 
 supported from CP framework right now. We used to have a test case to cover 
 this scenario which uses an base-64 encoded string at the test case to 
 represent a compiled jar file. This hardcoded way was not acceptable as a 
 valid test case, so we removed it eventually. 
 We need to have a better way to redo this case. Option 1) modify maven file 
 in order to compile a test cp class into jar, and put it to hdfs and local 
 fs, and run the cp class loading test; option 2) use Java 6.0 Compiler API to 
 compile the test case at runtime and create the jar file?
 Need more time to investigate which one is better. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4031) An imbalance result calculated by LoadBalancer


 [ 
https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-4031:


Attachment: HMaster222.rar

 An imbalance result calculated by LoadBalancer
 --

 Key: HBASE-4031
 URL: https://issues.apache.org/jira/browse/HBASE-4031
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HMaster222.rar


   I found the problem while the cluster couldn't balance(Around time of 
 2011-05-24 11:28).One node's regions count is the double of the other nodes. 
 And it didn't move regions anymore:
Address Start Code Load
 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, 
 maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, 
 usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, 
 regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 
 requests=0, regions=2633, usedHeap=69, maxHeap=8165 
 Total:  servers: 4   requests=0, regions=13689
   HBASE-3985-Same Region could be picked out twice in LoadBalancer was 
 found by my analysis on this problem.
   But I'm afraid it's not the main cause of the problem.
   There's one active master, one standby master, four regionservers in our 
 cluster.
 10:57:41, the standby hamster 222 becomes the active one.
 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master 
 startup proceeding: master failover
 4 regionservers was registered in 222 one by one. Only one regionserver 
 seemed some time late.
 2011-05-24 10:57:37,533 INFO : Registering 
 server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true
 2011-05-24 10:57:37,537 INFO : Registering 
 server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true
 2011-05-24 10:57:37,598 INFO : Registering 
 server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true
 2011-05-24 10:59:00,408 INFO : Registering 
 server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false
 13134 regions needed to move after rebuildUserRegions(13689 regions in the 
 cluster during the time).
 2011-05-24 10:58:47,534 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
 process 13134 regions in transition
 All the 13134 regions were opened, regions opened count in each server:
 158-1-101-222,20020,1306205940117Count: 834
 158-1-101-82,20020,1306205415714Count: 4093
 158-1-101-202,20020,1306205409671Count: 4118
 158-1-101-52,20020,1306205417261Count: 4089
 The nearest balancer calculate results:
 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded 
 servers onto 1 less loaded servers
 5012 is an unimaginable number here, for it is larger than the average 
 number 3424.5

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4031) An imbalance result calculated by LoadBalancer


 [ 
https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-4031:


Attachment: HRegionServer222.rar

 An imbalance result calculated by LoadBalancer
 --

 Key: HBASE-4031
 URL: https://issues.apache.org/jira/browse/HBASE-4031
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HMaster222.rar, HRegionServer222.rar


   I found the problem while the cluster couldn't balance(Around time of 
 2011-05-24 11:28).One node's regions count is the double of the other nodes. 
 And it didn't move regions anymore:
Address Start Code Load
 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, 
 maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, 
 usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, 
 regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 
 requests=0, regions=2633, usedHeap=69, maxHeap=8165 
 Total:  servers: 4   requests=0, regions=13689
   HBASE-3985-Same Region could be picked out twice in LoadBalancer was 
 found by my analysis on this problem.
   But I'm afraid it's not the main cause of the problem.
   There's one active master, one standby master, four regionservers in our 
 cluster.
 10:57:41, the standby hamster 222 becomes the active one.
 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master 
 startup proceeding: master failover
 4 regionservers was registered in 222 one by one. Only one regionserver 
 seemed some time late.
 2011-05-24 10:57:37,533 INFO : Registering 
 server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true
 2011-05-24 10:57:37,537 INFO : Registering 
 server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true
 2011-05-24 10:57:37,598 INFO : Registering 
 server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true
 2011-05-24 10:59:00,408 INFO : Registering 
 server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false
 13134 regions needed to move after rebuildUserRegions(13689 regions in the 
 cluster during the time).
 2011-05-24 10:58:47,534 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
 process 13134 regions in transition
 All the 13134 regions were opened, regions opened count in each server:
 158-1-101-222,20020,1306205940117Count: 834
 158-1-101-82,20020,1306205415714Count: 4093
 158-1-101-202,20020,1306205409671Count: 4118
 158-1-101-52,20020,1306205417261Count: 4089
 The nearest balancer calculate results:
 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded 
 servers onto 1 less loaded servers
 5012 is an unimaginable number here, for it is larger than the average 
 number 3424.5

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4031) An imbalance result calculated by LoadBalancer


[ 
https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054797#comment-13054797
 ] 

Jieshan Bean commented on HBASE-4031:
-

For the original log fils is too big, I just attached some fragments of the 
full logs.

 An imbalance result calculated by LoadBalancer
 --

 Key: HBASE-4031
 URL: https://issues.apache.org/jira/browse/HBASE-4031
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HMaster222.rar, HRegionServer222.rar


   I found the problem while the cluster couldn't balance(Around time of 
 2011-05-24 11:28).One node's regions count is the double of the other nodes. 
 And it didn't move regions anymore:
Address Start Code Load
 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, 
 maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, 
 usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, 
 regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 
 requests=0, regions=2633, usedHeap=69, maxHeap=8165 
 Total:  servers: 4   requests=0, regions=13689
   HBASE-3985-Same Region could be picked out twice in LoadBalancer was 
 found by my analysis on this problem.
   But I'm afraid it's not the main cause of the problem.
   There's one active master, one standby master, four regionservers in our 
 cluster.
 10:57:41, the standby hamster 222 becomes the active one.
 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master 
 startup proceeding: master failover
 4 regionservers was registered in 222 one by one. Only one regionserver 
 seemed some time late.
 2011-05-24 10:57:37,533 INFO : Registering 
 server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true
 2011-05-24 10:57:37,537 INFO : Registering 
 server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true
 2011-05-24 10:57:37,598 INFO : Registering 
 server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true
 2011-05-24 10:59:00,408 INFO : Registering 
 server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false
 13134 regions needed to move after rebuildUserRegions(13689 regions in the 
 cluster during the time).
 2011-05-24 10:58:47,534 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to 
 process 13134 regions in transition
 All the 13134 regions were opened, regions opened count in each server:
 158-1-101-222,20020,1306205940117Count: 834
 158-1-101-82,20020,1306205415714Count: 4093
 158-1-101-202,20020,1306205409671Count: 4118
 158-1-101-52,20020,1306205417261Count: 4089
 The nearest balancer calculate results:
 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded 
 servers onto 1 less loaded servers
 5012 is an unimaginable number here, for it is larger than the average 
 number 3424.5

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4033) The shutdown RegionServer could be added to AssignmentManager.servers again

The shutdown RegionServer could be added to AssignmentManager.servers again
---

 Key: HBASE-4033
 URL: https://issues.apache.org/jira/browse/HBASE-4033
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4


The folling steps can easily recreate the problem:
1. There's thousands of regions in the cluster.
2. Stop the cluster.
3. Start the cluster. Killing one regionserver while the regions were opening. 
Restarted it after 10 seconds.

The shutted regionserver will appear in the AssignmentManager.servers list 
again.

For example:

Issue 1:

2011-06-23 14:14:30,775 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: 
Server information: 167-6-1-12,20020,1308803390123=2220, 
167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2205, 
167-6-1-13,20020,1308803514394=2183

Two regionservers(One of it had aborted) had the same hostname but different 
startcode:
167-6-1-13,20020,1308803391742=2374
167-6-1-13,20020,1308803514394=2183

Issue 2:

(1).The Rs 167-6-1-11,20020,1308105402003 finished shutdown at 10:46:37,774:
10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 
Finished processing of shutdown of 167-6-1-11,20020,1308105402003

(2).Overwriting happened, it seemed the RS was still exist in the set of 
AssignmentManager#regions:
10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, 
load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

(3).Region was assigned to this dead RS again at 10:50:20,671:
10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning 
region 
Jeason10,0805861380030,1308032774777.612342de1fe4733f72299d70addb6d11. to 
167-6-1-11,20020,1308105402003

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4033) The shutdown RegionServer could be added to AssignmentManager.servers again