Yes, DC1AuthDFSC1D3 hosts the root region. It is also region server 3.
DC1AuthDFSC1D1, DC1AuthDFSC1D2, DC1AuthDFSC1D3 and DC1AuthDFSC1D4 are 4 region
servers in our cluster.
******************************************
I checked with Data Centre team, they confirmed that there is no firewall in
the network where hbase servers and client applications is running.
******************************************
Regarding client and server running different versions, they are running same
versions. If there was version mismatch, I guess we would be seeing the issue
for all the reads. Here we see the issue only for few reads, one in 10-15
reads fail this way. We do use same hbase, zookeeper and hadoop jars as found
in the HBase distribution.
Strangely enough, I saw the below for the first time today, and it has occurred
only once so far. 10.3.48.61 is the IP address where our client app is running.
2011-08-22 11:46:55,905 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7625 got version 6 expected version 3
2011-08-22 11:46:57,542 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7626 got version 6 expected version 3
2011-08-22 11:46:58,483 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7627 got version 6 expected version 3
2011-08-22 11:46:59,335 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7628 got version 6 expected version 3
2011-08-22 11:47:00,164 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7629 got version 6 expected version 3
2011-08-22 11:47:00,972 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7630 got version 6 expected version 3
2011-08-22 11:47:01,768 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7631 got version 6 expected version 3
2011-08-22 11:47:02,648 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect
header or version mismatch from 10.3.48.61:7632 got version 6 expected version 3
******************************************
I enabled debug logging level for all classes today. Here is the exception
associated with "null" messages.
*** Do you think that some thread in client is doing interrupt() resulting in
"java.nio.channels.ClosedByInterruptException" below? ***
2011-08-22 11:51:29,663 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hbase.client.HConnectionManager$HConnectionImplementation] -
locateRegionInMeta parentTable=-ROOT-, metaLocation=address:
DC1AuthDFSC1D3.cidr.gov.in:6020, regioninfo: -ROOT-,,0.70236052, attempt=0 of
10 failed; retrying after sleep of 1000 because: null
2011-08-22 11:51:29,663 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hbase.client.HConnectionManager$HConnectionImplementation] - Lookedup root
region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hbase.client.HConnectionManager$HConnectionImplementation] - Lookedup root
region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hadoop.ipc.HBaseClient] - Connecting to
DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hadoop.ipc.HBaseClient] - closing ipc connection to
DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020: null
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:511)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy41.getClosestRowBefore(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:719)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:589)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:558)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:687)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:593)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
at
org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
at
in.gov.uidai.platform.impl.persistence.handler.HBaseHandler.findEntities(HBaseHandler.java:271)
at
in.gov.uidai.platform.impl.persistence.handler.HBaseHandler.findObject(HBaseHandler.java:156)
at
in.gov.uidai.platform.impl.persistence.provider.AbstractPersistenceProvider.findObject(AbstractPersistenceProvider.java:116)
at
in.gov.uidai.platform.impl.persistence.PersistenceManagerProvider.findObject(PersistenceManagerProvider.java:270)
at
in.gov.uidai.authcommon.dao.impl.hbase.ResidentDetailsDAOImpl.findResidentDetailEntity(ResidentDetailsDAOImpl.java:69)
at
in.gov.uidai.authcommon.dao.impl.hbase.ResidentDetailsDAOImpl.findResidentDetails(ResidentDetailsDAOImpl.java:48)
at
in.gov.uidai.authcommon.core.impl.steps.ResidentDetailsReader.findResident(ResidentDetailsReader.java:176)
at
in.gov.uidai.authcommon.core.impl.steps.ResidentDetailsReader.doPerform(ResidentDetailsReader.java:63)
at
in.gov.uidai.authcommon.core.ProcessingStep.perform(ProcessingStep.java:36)
at
in.gov.uidai.authcommon.core.impl.Authenticator.performAndReturnContext(Authenticator.java:40)
at
in.gov.uidai.authserver.grid.AuthenticationGridJob.execute(AuthenticationGridJob.java:27)
at
org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:406)
at
org.gridgain.grid.util.runnable.GridRunnable$1.run(GridRunnable.java:142)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hadoop.ipc.HBaseClient] - IPC Client (47) connection to
DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020 from an unknown user: closed
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hbase.client.HConnectionManager$HConnectionImplementation] -
locateRegionInMeta parentTable=-ROOT-, metaLocation=address:
DC1AuthDFSC1D3.cidr.gov.in:6020, regioninfo: -ROOT-,,0.70236052, attempt=1 of
10 failed; retrying after sleep of 1000 because: null
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG
[hbase.client.HConnectionManager$HConnectionImplementation] - Lookedup root
region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
...
...
...
And above pattern keeps repeating.
******************************************
Regards,
Srikanth
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel
Cryans
Sent: Monday, August 22, 2011 2:32 AM
To: [email protected]
Subject: Re: Query regarding HTable.get and timeouts
Yeah that null message isn't really helpful :)
So one thing that might be helpful would be to know who DC1AuthDFSC1D3
is, since you identified the logs as "Region server n".
Then look at the master's web UI and see where -ROOT- is assigned. Is
it also DC1AuthDFSC1D3?
If so, then I would proceed by checking if there's a firewall in
between the client and the cluster, also I would make sure that the
client is running the same version as the server.
J-D
On Sat, Aug 20, 2011 at 5:56 AM, Srikanth P. Shreenivas
<[email protected]> wrote:
> Further in this investigation, we enabled the debug logs on client side.
>
> We are observing that client is trying to root region, and is continuously
> failing to do so. The logs are filled with entries like this:
>
> 2011-08-20 17:20:09,092 [gridgain-#6%authGrid%] DEBUG
> [hbase.client.HConnectionManager$HConnectionImplementation] - Lookedup root
> region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2cc25ae3;
> hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
> 2011-08-20 17:20:09,092 [gridgain-#6%authGrid%] DEBUG
> [hbase.client.HConnectionManager$HConnectionImplementation] -
> locateRegionInMeta parentTable=-ROOT-, metaLocation=address:
> DC1AuthDFSC1D3.cidr.gov.in:6020, regioninfo: -ROOT-,,0.70236052, attempt=0 of
> 10 failed; retrying after sleep of 1000
> because: null
>
> Client keeps retrying and retries get exhausted.
>
>
> Complete logs are available here: https://gist.github.com/1159064 including
> logs of master, zookeeper and region servers.
>
>
> If you can please look at the logs and provide some inputs on this issue,
> then it will be really helpful.
> We are really not sure why client is failing to get root regions from the
> server. Any guidance will be greatly appreciated.
>
>
> Thanks a lot,
> Srikanth
________________________________
http://www.mindtree.com/email/disclaimer.html