[jira] [Created] (HBASE-5668) HRegionServer.checkFileSystem() should only abort() after fs is down for some time

2012-03-28 Thread Prakash Khemani (Created) (JIRA)
HRegionServer.checkFileSystem() should only abort() after fs is down for some 
time
--

 Key: HBASE-5668
 URL: https://issues.apache.org/jira/browse/HBASE-5668
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


When checkFileSystem() fails then the region server should wait for sometime 
before aborting. By default, the timeout can be same as zookeeper session 
timeout.

When say a rack switch reboots or fails for a few minutes, and all the traffic 
to the region server dies ... then we don't want the region servers to 
unnecessarily kill themselves when ongoing compactions or flushes fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-03-22 Thread Prakash Khemani (Created) (JIRA)
SplitLogManager - prevent unnecessary attempts to resubmits
---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


Currently once a watch fires that the task node has been updated (hearbeated) 
by the worker, the splitlogmanager still quite some time before it updates the 
last heard from time. This is because the manager currently schedules another 
getDataSetWatch() and only after that finishes will it update the task's last 
heard from time.

This leads to a large number of zk-BadVersion warnings when resubmission is 
continuously attempted and it fails.


Two changes should be made
(1) On a resubmission failure because of BadVersion the task's lastUpdate time 
should get upped.
(2) The task's lastUpdate time should get upped as soon as the 
nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
complete.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5519) Incorrect warning in splitlogmanager

2012-03-04 Thread Prakash Khemani (Created) (JIRA)
Incorrect warning in splitlogmanager


 Key: HBASE-5519
 URL: https://issues.apache.org/jira/browse/HBASE-5519
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


because of recently added behavior - where the splitlogmanager timeout thread 
get's data from zk node just to check that the zk node is there ... we might 
have multiple watches firing without the task znode expiring.

remove the poor warning message. (internally, there was an assert that failed 
in Mikhail's tests)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5518) Incorrect warning in splitlogmanager

2012-03-04 Thread Prakash Khemani (Created) (JIRA)
Incorrect warning in splitlogmanager


 Key: HBASE-5518
 URL: https://issues.apache.org/jira/browse/HBASE-5518
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


because of recently added behavior - where the splitlogmanager timeout thread 
get's data from zk node just to check that the zk node is there ... we might 
have multiple watches firing without the task znode expiring.

remove the poor warning message. (internally, there was an assert that failed 
in Mikhail's tests)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-07 Thread Prakash Khemani (Created) (JIRA)
GC free memory management in Level-1 Block Cache


 Key: HBASE-5347
 URL: https://issues.apache.org/jira/browse/HBASE-5347
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani


On eviction of a block from the block-cache, instead of waiting for the garbage 
collecter to reuse its memory, reuse the block right away.

This will require us to keep reference counts on the HFile blocks. Once we have 
the reference counts in place we can do our own simple blocks-out-of-slab 
allocation for the block-cache.

This will help us with
* reducing gc pressure, especially in the old generation
* making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5326) splitlogmanager zk async handlers after shutdown

2012-02-02 Thread Prakash Khemani (Created) (JIRA)
splitlogmanager zk async handlers after shutdown


 Key: HBASE-5326
 URL: https://issues.apache.org/jira/browse/HBASE-5326
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


The zk async handlers in SpltLogManager should ignore all callbacks after 
SplitLogManager has shutdown. Will make the test logs less noisy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5308) Retry of distributed log splitting will fail on ./logs/rs-splitting directories

2012-01-31 Thread Prakash Khemani (Created) (JIRA)
Retry of distributed log splitting will fail on ./logs/rs-splitting directories
---

 Key: HBASE-5308
 URL: https://issues.apache.org/jira/browse/HBASE-5308
 Project: HBase
  Issue Type: Bug
 Environment: Only exists in 89 branch

Master.splitLog() doesn't handle the case where the rs log file has been renamed
Reporter: Prakash Khemani




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5296) confusing code in HFileBlockIndex.seekToBlockIndex()

2012-01-27 Thread Prakash Khemani (Created) (JIRA)
confusing code in HFileBlockIndex.seekToBlockIndex()


 Key: HBASE-5296
 URL: https://issues.apache.org/jira/browse/HBASE-5296
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Mikhail Bautin


{code}
public HFileBlock seekToDataBlock(final byte[] key, int keyOffset,
int keyLength, HFileBlock currentBlock, boolean cacheBlocks,
boolean pread, boolean isCompaction)
throws IOException {
  int rootLevelIndex = rootBlockContainingKey(key, keyOffset, keyLength);
  if (rootLevelIndex  0 || rootLevelIndex = blockOffsets.length) {
return null;
  }
{code}
In the above code rootLevelIndex is never greater-than-or-equal-to 
blockOffsets.length.

(It can confuse reading of the code if you follow it from 
StoreFileScanner.seek(kv))



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5287) fsync can go into an infinite loop

2012-01-26 Thread Prakash Khemani (Created) (JIRA)
fsync can go into an infinite loop
--

 Key: HBASE-5287
 URL: https://issues.apache.org/jira/browse/HBASE-5287
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


HBaseFsckRepair.prompt() should check for -1 return value from System.in.read()

Only affects 0.89 release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5013) NPE in HBaseClient$Connection.receiveResponse

2011-12-12 Thread Prakash Khemani (Created) (JIRA)
NPE in HBaseClient$Connection.receiveResponse
-

 Key: HBASE-5013
 URL: https://issues.apache.org/jira/browse/HBASE-5013
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


We have the following NPE

java.io.IOException: Call to hbasedev003.snc3.facebook.com/10.26.1.198:60020 
failed on local exception: java.io.IOException: Unexpected exception receiving 
call responses
at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:916)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:885)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:149)
at $Proxy6.getProtocolVersion(Unknown Source)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:182)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:228)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1197)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1154)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1141)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:872)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:768)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:742)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:978)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:772)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:736)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:207)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:177)
at com.facebook.BulkImporter.VerifyAssocs.(VerifyAssocs.java:248)
at 
com.facebook.BulkImporter.VerifyAssocs$AssocVerifierMapper.setup(VerifyAssocs.java:138)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:624)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.io.IOException: Unexpected exception receiving call responses
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:494)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:490)


===


Just by looking at code the NPE shouldn't have happened

HBaseClient$Connection.setUpIOstreams() sets up in and out.
Then it starts the Connection thread.
The Connection in its run method calls receiveResponse()
In receiveResponse() NPE happens in 
int id = in.readInt();

As per java.util.concurrent docs the the initialization of in should have been 
visible in the Connection thread's run() method. So I don't know how in ended 
up being NULL.

===

While looking into this issue I noticed a small problem in the 
closeConnection() method. I will soon upload a diff.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4987) wrong use of incarnation var in SplitLogManager

2011-12-08 Thread Prakash Khemani (Created) (JIRA)
wrong use of incarnation var in SplitLogManager
---

 Key: HBASE-4987
 URL: https://issues.apache.org/jira/browse/HBASE-4987
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


@Ramakrishna found and analyzed an issue in SplitLogManager. But I don't think 
that the fix is correct. Will upload a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4967) connected client thrift sockets should have a server side read timeout

2011-12-06 Thread Prakash Khemani (Created) (JIRA)
connected client thrift sockets should have a server side read timeout
--

 Key: HBASE-4967
 URL: https://issues.apache.org/jira/browse/HBASE-4967
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani


If there is no socket read timeout and if the Thrift server is a 
ThreadPoolServer then server side threads will be used up waiting for dead 
unresponsive clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4969) tautology in HRegionInfo.readFields

2011-12-06 Thread Prakash Khemani (Created) (JIRA)
tautology in HRegionInfo.readFields
---

 Key: HBASE-4969
 URL: https://issues.apache.org/jira/browse/HBASE-4969
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


In HRegionInfo.readFields() the following looks wrong to me

} else if (getVersion() == VERSION) {

it is always true.

Should it have been

} else if (getVersion() == version) {

version is a local variable where the deserialized-version is stored.

(I am struggling with another issue where after applying some patches - 
including HBASE-4388 Second start after migration from 90 to trunk crashes my 
version of hbase-92 HRegionInfo.readFields() tries to find HTD in HRegionInfo 
and fails)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4932) Block cache can be mistakenly instantiated by tools

2011-12-01 Thread Prakash Khemani (Created) (JIRA)
Block cache can be mistakenly instantiated by tools
---

 Key: HBASE-4932
 URL: https://issues.apache.org/jira/browse/HBASE-4932
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


Map Reduce tasks that create a writer to write HFiles inadvertently end up 
creating block cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4831) LRU stats thread should be a daemon thread

2011-11-19 Thread Prakash Khemani (Created) (JIRA)
LRU stats thread should be a daemon thread
--

 Key: HBASE-4831
 URL: https://issues.apache.org/jira/browse/HBASE-4831
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


I have seen the hung processes where the following was the only non-daemon 
thread


LRU Statistics #0 prio=10 tid=0x2ab0bc04f800 nid=0x11ac waiting on 
condition [0x42f57000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x2aaab9a1c000 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
  at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
  at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4721) Configurable TTL for Delete Markers

2011-11-01 Thread Prakash Khemani (Created) (JIRA)
Configurable TTL for Delete Markers
---

 Key: HBASE-4721
 URL: https://issues.apache.org/jira/browse/HBASE-4721
 Project: HBase
  Issue Type: New Feature
Reporter: Prakash Khemani
Assignee: Prakash Khemani


There is a need to provide long TTLs for delete markers. This is useful when 
replicating hbase logs from one cluster to another. The receiving cluster 
shouldn't compact away the delete markers because the affected key-values might 
still be on the way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4696) HRegionThriftServer

2011-10-28 Thread Prakash Khemani (Created) (JIRA)
HRegionThriftServer
---

 Key: HBASE-4696
 URL: https://issues.apache.org/jira/browse/HBASE-4696
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4674) splitLog silently fails

2011-10-25 Thread Prakash Khemani (Created) (JIRA)
splitLog silently fails
---

 Key: HBASE-4674
 URL: https://issues.apache.org/jira/browse/HBASE-4674
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
 Environment: splitLog() can fail silently and region can open w/o its 
edits getting replayed.
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Blocker




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira