[jira] [Created] (HBASE-5668) HRegionServer.checkFileSystem() should only abort() after fs is down for some time
HRegionServer.checkFileSystem() should only abort() after fs is down for some time -- Key: HBASE-5668 URL: https://issues.apache.org/jira/browse/HBASE-5668 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani When checkFileSystem() fails then the region server should wait for sometime before aborting. By default, the timeout can be same as zookeeper session timeout. When say a rack switch reboots or fails for a few minutes, and all the traffic to the region server dies ... then we don't want the region servers to unnecessarily kill themselves when ongoing compactions or flushes fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5519) Incorrect warning in splitlogmanager
Incorrect warning in splitlogmanager Key: HBASE-5519 URL: https://issues.apache.org/jira/browse/HBASE-5519 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani because of recently added behavior - where the splitlogmanager timeout thread get's data from zk node just to check that the zk node is there ... we might have multiple watches firing without the task znode expiring. remove the poor warning message. (internally, there was an assert that failed in Mikhail's tests) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5518) Incorrect warning in splitlogmanager
Incorrect warning in splitlogmanager Key: HBASE-5518 URL: https://issues.apache.org/jira/browse/HBASE-5518 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani because of recently added behavior - where the splitlogmanager timeout thread get's data from zk node just to check that the zk node is there ... we might have multiple watches firing without the task znode expiring. remove the poor warning message. (internally, there was an assert that failed in Mikhail's tests) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5347) GC free memory management in Level-1 Block Cache
GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5326) splitlogmanager zk async handlers after shutdown
splitlogmanager zk async handlers after shutdown Key: HBASE-5326 URL: https://issues.apache.org/jira/browse/HBASE-5326 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani The zk async handlers in SpltLogManager should ignore all callbacks after SplitLogManager has shutdown. Will make the test logs less noisy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5308) Retry of distributed log splitting will fail on ./logs/rs-splitting directories
Retry of distributed log splitting will fail on ./logs/rs-splitting directories --- Key: HBASE-5308 URL: https://issues.apache.org/jira/browse/HBASE-5308 Project: HBase Issue Type: Bug Environment: Only exists in 89 branch Master.splitLog() doesn't handle the case where the rs log file has been renamed Reporter: Prakash Khemani -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5296) confusing code in HFileBlockIndex.seekToBlockIndex()
confusing code in HFileBlockIndex.seekToBlockIndex() Key: HBASE-5296 URL: https://issues.apache.org/jira/browse/HBASE-5296 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Mikhail Bautin {code} public HFileBlock seekToDataBlock(final byte[] key, int keyOffset, int keyLength, HFileBlock currentBlock, boolean cacheBlocks, boolean pread, boolean isCompaction) throws IOException { int rootLevelIndex = rootBlockContainingKey(key, keyOffset, keyLength); if (rootLevelIndex 0 || rootLevelIndex = blockOffsets.length) { return null; } {code} In the above code rootLevelIndex is never greater-than-or-equal-to blockOffsets.length. (It can confuse reading of the code if you follow it from StoreFileScanner.seek(kv)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5287) fsync can go into an infinite loop
fsync can go into an infinite loop -- Key: HBASE-5287 URL: https://issues.apache.org/jira/browse/HBASE-5287 Project: HBase Issue Type: Bug Reporter: Prakash Khemani HBaseFsckRepair.prompt() should check for -1 return value from System.in.read() Only affects 0.89 release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5013) NPE in HBaseClient$Connection.receiveResponse
NPE in HBaseClient$Connection.receiveResponse - Key: HBASE-5013 URL: https://issues.apache.org/jira/browse/HBASE-5013 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani We have the following NPE java.io.IOException: Call to hbasedev003.snc3.facebook.com/10.26.1.198:60020 failed on local exception: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:916) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:885) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:149) at $Proxy6.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:182) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:228) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1197) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1154) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1141) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:872) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:768) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:742) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:978) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:772) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:736) at org.apache.hadoop.hbase.client.HTable.(HTable.java:207) at org.apache.hadoop.hbase.client.HTable.(HTable.java:177) at com.facebook.BulkImporter.VerifyAssocs.(VerifyAssocs.java:248) at com.facebook.BulkImporter.VerifyAssocs$AssocVerifierMapper.setup(VerifyAssocs.java:138) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:624) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:494) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:490) === Just by looking at code the NPE shouldn't have happened HBaseClient$Connection.setUpIOstreams() sets up in and out. Then it starts the Connection thread. The Connection in its run method calls receiveResponse() In receiveResponse() NPE happens in int id = in.readInt(); As per java.util.concurrent docs the the initialization of in should have been visible in the Connection thread's run() method. So I don't know how in ended up being NULL. === While looking into this issue I noticed a small problem in the closeConnection() method. I will soon upload a diff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4987) wrong use of incarnation var in SplitLogManager
wrong use of incarnation var in SplitLogManager --- Key: HBASE-4987 URL: https://issues.apache.org/jira/browse/HBASE-4987 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani @Ramakrishna found and analyzed an issue in SplitLogManager. But I don't think that the fix is correct. Will upload a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4967) connected client thrift sockets should have a server side read timeout
connected client thrift sockets should have a server side read timeout -- Key: HBASE-4967 URL: https://issues.apache.org/jira/browse/HBASE-4967 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani If there is no socket read timeout and if the Thrift server is a ThreadPoolServer then server side threads will be used up waiting for dead unresponsive clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4969) tautology in HRegionInfo.readFields
tautology in HRegionInfo.readFields --- Key: HBASE-4969 URL: https://issues.apache.org/jira/browse/HBASE-4969 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani In HRegionInfo.readFields() the following looks wrong to me } else if (getVersion() == VERSION) { it is always true. Should it have been } else if (getVersion() == version) { version is a local variable where the deserialized-version is stored. (I am struggling with another issue where after applying some patches - including HBASE-4388 Second start after migration from 90 to trunk crashes my version of hbase-92 HRegionInfo.readFields() tries to find HTD in HRegionInfo and fails) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4932) Block cache can be mistakenly instantiated by tools
Block cache can be mistakenly instantiated by tools --- Key: HBASE-4932 URL: https://issues.apache.org/jira/browse/HBASE-4932 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Map Reduce tasks that create a writer to write HFiles inadvertently end up creating block cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4831) LRU stats thread should be a daemon thread
LRU stats thread should be a daemon thread -- Key: HBASE-4831 URL: https://issues.apache.org/jira/browse/HBASE-4831 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani I have seen the hung processes where the following was the only non-daemon thread LRU Statistics #0 prio=10 tid=0x2ab0bc04f800 nid=0x11ac waiting on condition [0x42f57000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9a1c000 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4721) Configurable TTL for Delete Markers
Configurable TTL for Delete Markers --- Key: HBASE-4721 URL: https://issues.apache.org/jira/browse/HBASE-4721 Project: HBase Issue Type: New Feature Reporter: Prakash Khemani Assignee: Prakash Khemani There is a need to provide long TTLs for delete markers. This is useful when replicating hbase logs from one cluster to another. The receiving cluster shouldn't compact away the delete markers because the affected key-values might still be on the way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4696) HRegionThriftServer
HRegionThriftServer --- Key: HBASE-4696 URL: https://issues.apache.org/jira/browse/HBASE-4696 Project: HBase Issue Type: Bug Reporter: Prakash Khemani -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4674) splitLog silently fails
splitLog silently fails --- Key: HBASE-4674 URL: https://issues.apache.org/jira/browse/HBASE-4674 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Environment: splitLog() can fail silently and region can open w/o its edits getting replayed. Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira