Hi St.Ack,
We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm,
hadoop-hbase-0.90.1+15.18-1.noarch.rpm,
hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
I ran a the same test which I was running for the app when it was running on
CDH2. The test app posts a request the web app every 100ms, and the web app
reads a HBase record, performs some logic, and saves an audit trail by writing
another HBase record.
When our app was running on CDH2, I observed the below issue for every 10 to 15
requests.
With CDH3, this issue is not happening at all. So, seems like situation has
improved a lot, and our app seems to be lot more stable.
However, I am still seeing an issue though. There are many requests (around
1%) which are not able to read the record from the HBase, and the get call is
hanging for almost 10 minutes. This is what I see in application log:
2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler] -
Exception occurred in searchData:
java.io.IOException: Giving up trying to get region server: thread is
interrupted.
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
<...app specific trace removed...>
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
I am running the test on the same record, so all by "get" are for same row id.
It will be of immense help if you can provide some inputs on whether we are
missing some configuration settings, or is there a way to get around this.
Thanks,
Srikanth
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Wednesday, June 29, 2011 7:48 PM
To: [email protected]
Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
Go to CDH3 if you can. CDH2 is also old.
St.Ack
On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
<[email protected]> wrote:
> Thanks St. Ack for the inputs.
>
> Will upgrading to CDH3 help or is there a version within CDH2 that you
> recommend we should upgrade to?
>
> Regards,
> Srikanth
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 11:16 AM
> To: [email protected]
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Can you upgrade? That release is > 18 months old. A bunch has
> happened in the meantime.
>
> For retries exhausted, check whats going on on the remote regionserver
> that you are trying to write too. Its probably struggling and thats
> why requests are not going through -- or the client missed the fact
> that region moved (all stuff that should be working better in latest
> hbase).
>
> St.Ack
>
> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
> <[email protected]> wrote:
>> Hi,
>>
>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster
>> in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>> We are using pretty much default configuration, and only thing we have
>> customized is that we have allocated 4GB RAM in
>> /etc/hbase-0.20/conf/hbase-env.sh
>>
>> In our setup, we have a web application that reads a record from HBase and
>> writes a record as part of each web request. The application is hosted in
>> Apache Tomcat 7 and is a stateless web application providing a REST-like web
>> service API.
>>
>> We are observing that our reads and writes times out once in a while. This
>> happens more for writes.
>> We see below exception in our application logs:
>>
>>
>> Exception Type 1 - During Get:
>> ---------------------------------------
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
>> region server 10.1.68.36:60020 for region
>> employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879, row
>> 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>> Exceptions:
>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>> java.nio.channels.ClosedByInterruptException
>>
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>> <snip>
>>
>> Exception Type 2 - During Put:
>> ---------------------------------------------
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
>> to contact region server 10.1.68.34:60020 for region
>> audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b',
>> but failed after 10 attempts.
>> Exceptions:
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception:
>> java.nio.channels.ClosedByInterruptException
>>
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>> <snip>
>>
>> Any inputs on why this is happening, or how to rectify it will be of immense
>> help.
>>
>> Thanks,
>> Srikanth
>>
>>
>>
>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village,
>> RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax
>> +91 80 2626 4100| Mob: 9880141059|email:
>> [email protected]<mailto:[email protected]>
>> |www.mindtree.com<http://www.mindtree.com/> |
>>
>>
>> ________________________________
>>
>> http://www.mindtree.com/email/disclaimer.html
>>
>