You read the requirements section in our docs and you have upped the ulimits, nprocs, etc? http://hbase.apache.org/book/os.html
If you know the row, can you deduce the regionserver its talking too? (Below is the client failure -- we need to figure whats up on server-side). Once you've done that, can you check its logs? See if you can figure anything on why the hang? Thanks, St.Ack On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas <[email protected]> wrote: > Hi St.Ack, > > We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, > hadoop-hbase-0.90.1+15.18-1.noarch.rpm, > hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm). > > I ran a the same test which I was running for the app when it was running on > CDH2. The test app posts a request the web app every 100ms, and the web app > reads a HBase record, performs some logic, and saves an audit trail by > writing another HBase record. > > When our app was running on CDH2, I observed the below issue for every 10 to > 15 requests. > With CDH3, this issue is not happening at all. So, seems like situation has > improved a lot, and our app seems to be lot more stable. > > However, I am still seeing an issue though. There are many requests (around > 1%) which are not able to read the record from the HBase, and the get call is > hanging for almost 10 minutes. This is what I see in application log: > > 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler] > - Exception occurred in searchData: > java.io.IOException: Giving up trying to get region server: thread is > interrupted. > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) > > <...app specific trace removed...> > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > > I am running the test on the same record, so all by "get" are for same row id. > > > > It will be of immense help if you can provide some inputs on whether we are > missing some configuration settings, or is there a way to get around this. > > Thanks, > Srikanth > > > > > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Stack > Sent: Wednesday, June 29, 2011 7:48 PM > To: [email protected] > Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments > > Go to CDH3 if you can. CDH2 is also old. > St.Ack > > On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas > <[email protected]> wrote: >> Thanks St. Ack for the inputs. >> >> Will upgrading to CDH3 help or is there a version within CDH2 that you >> recommend we should upgrade to? >> >> Regards, >> Srikanth >> >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of Stack >> Sent: Wednesday, June 29, 2011 11:16 AM >> To: [email protected] >> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments >> >> Can you upgrade? That release is > 18 months old. A bunch has >> happened in the meantime. >> >> For retries exhausted, check whats going on on the remote regionserver >> that you are trying to write too. Its probably struggling and thats >> why requests are not going through -- or the client missed the fact >> that region moved (all stuff that should be working better in latest >> hbase). >> >> St.Ack >> >> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas >> <[email protected]> wrote: >>> Hi, >>> >>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster >>> in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch). >>> We are using pretty much default configuration, and only thing we have >>> customized is that we have allocated 4GB RAM in >>> /etc/hbase-0.20/conf/hbase-env.sh >>> >>> In our setup, we have a web application that reads a record from HBase and >>> writes a record as part of each web request. The application is hosted in >>> Apache Tomcat 7 and is a stateless web application providing a REST-like >>> web service API. >>> >>> We are observing that our reads and writes times out once in a while. >>> This happens more for writes. >>> We see below exception in our application logs: >>> >>> >>> Exception Type 1 - During Get: >>> --------------------------------------- >>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact >>> region server 10.1.68.36:60020 for region >>> employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row >>> 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts. >>> Exceptions: >>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048) >>> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417) >>> <snip> >>> >>> Exception Type 2 - During Put: >>> --------------------------------------------- >>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying >>> to contact region server 10.1.68.34:60020 for region >>> audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', >>> but failed after 10 attempts. >>> Exceptions: >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: >>> java.nio.channels.ClosedByInterruptException >>> >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247) >>> at >>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609) >>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474) >>> <snip> >>> >>> Any inputs on why this is happening, or how to rectify it will be of >>> immense help. >>> >>> Thanks, >>> Srikanth >>> >>> >>> >>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, >>> RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / >>> Fax +91 80 2626 4100| Mob: 9880141059|email: >>> [email protected]<mailto:[email protected]> >>> |www.mindtree.com<http://www.mindtree.com/> | >>> >>> >>> ________________________________ >>> >>> http://www.mindtree.com/email/disclaimer.html >>> >> >
