[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-05-31 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307699#comment-15307699
 ] 

Yu Li commented on HBASE-13960:
---

This issue is resolved by HBASE-15856

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
>Assignee: Yu Li
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-02-01 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127772#comment-15127772
 ] 

Yu Li commented on HBASE-13960:
---

ping...

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
>Assignee: Yu Li
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-25 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116898#comment-15116898
 ] 

Yu Li commented on HBASE-13960:
---

ping [~apurtell], waiting for your comments sir, thanks.

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
>Assignee: Yu Li
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112235#comment-15112235
 ] 

Ted Yu commented on HBASE-13960:


I am fine with latest patch. 

[~apurtell]:
What do you think ?


> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
>Assignee: Yu Li
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112021#comment-15112021
 ] 

Yu Li commented on HBASE-13960:
---

ping [~tedyu], are we good to go or any comments sir? Thanks.

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
>Assignee: Yu Li
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-20 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110015#comment-15110015
 ] 

Yu Li commented on HBASE-13960:
---

The last HadoopQA run looks good. The findbugs issue is not introduced by patch 
here and have logged HBASE-15148 to resolve it.

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-update.v2.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108229#comment-15108229
 ] 

Hadoop QA commented on HBASE-13960:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
34s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s 
{color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
22m 5s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hbase-client in the patch passed with JDK v1.8.0. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 106m 49s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.8.0. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 107m 30s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 260m 51s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12783277/HBASE-13960-update.v2.patch
 |
| JIRA Issue | HBASE-13960 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux asf909.gq1.ygridcore.net 

[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107823#comment-15107823
 ] 

Hadoop QA commented on HBASE-13960:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s 
{color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m 23s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hbase-client in the patch passed with JDK v1.8.0. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 53s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s 
{color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 37s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 251m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0 Failed junit tests | hadoop.hbase.master.TestMasterNoCluster |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.hbase.client.TestBlockEvictionFromClient |
|   | hadoop.hbase.master.TestMasterNoCluster |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 

[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-19 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107970#comment-15107970
 ] 

Yu Li commented on HBASE-13960:
---

The findbugs issue is not introduced by this JIRA, but the ut failure is caused 
by the changes here. Will update the patch to fix it.

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, 
> HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2016-01-19 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106845#comment-15106845
 ] 

Yu Li commented on HBASE-13960:
---

We encountered the same issue months ago in our 0.98.12 cluster, and fixed it 
with a similar solution. When trying to contribute our work back along with our 
upgrading to 1.x, found this one by searching.

The patch here is a little stale that cannot compile with latest code base. 
After some minor change, it works and the UT design is good enough to produce 
the issue. UT could pass w/ patch (main changes on AbstractRpcClient), and fail 
with below exceptions w/o patch:

If UnknownHostException thrown during locate region in meta
{noformat}
2016-01-19 22:07:14,712 DEBUG [Time-limited test] 
client.ConnectionImplementation(944): locateRegionInMeta 
parentTable=hbase:meta, metaLocation=, attempt=0 of 36 failed; retrying after 
sleep of 100 because: Failed after attempts=36, exceptions:
Tue Jan 19 22:07:14 CST 2016, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=68243: row 
'testRegionServerRandomRpcFail,row,99' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, hostname=localhost,55113,1453212381028, 
seqNum=0
{noformat}

If thrown during puts:
{noformat}
2016-01-19 22:04:25,685 INFO  [hconnection-0x3eeadfbb-shared-pool14-t1] 
client.AsyncProcess$AsyncRequestFutureImpl(1222): #12, 
table=testRegionServerRandomRpcFail, attempt=24/36 failed=1ops, last exception: 
java.net.UnknownHostException: unknown host: random_invalid_host on 
localhost,54943,1453211970421, tracking started null, retrying after=20005ms, 
replay=1ops
2016-01-19 22:04:25,695 INFO  [main] client.AsyncProcess(1711): #12, waiting 
for some tasks to finish. Expected max=0, tasksInProgress=24
2016-01-19 22:04:45,698 INFO  [hconnection-0x3eeadfbb-shared-pool14-t1] 
client.AsyncProcess$AsyncRequestFutureImpl(1222): #12, 
table=testRegionServerRandomRpcFail, attempt=25/36 failed=1ops, last exception: 
java.net.UnknownHostException: unknown host: random_invalid_host on 
localhost,54943,1453211970421, tracking started null, retrying after=20176ms, 
replay=1ops
2016-01-19 22:04:45,698 INFO  [main] client.AsyncProcess(1711): #12, waiting 
for some tasks to finish. Expected max=0, tasksInProgress=25
{noformat}

Will upload the updated patch soon.

> HConnection stuck with UnknownHostException 
> 
>
> Key: HBASE-13960
> URL: https://issues.apache.org/jira/browse/HBASE-13960
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 0.98.8
>Reporter: Kurt Young
> Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-v2.patch
>
>
> when put/get from hbase, if we meet a temporary dns failure causes resolve 
> RS's host, the error will never recovered. put/get will failed with 
> UnknownHostException forever. 
> I checked the code, and the reason maybe:
> 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
> ClientService.BlockingInterface stub from Hconnection
> 2. In HConnectionImplementation::getClient, it caches the stub with a 
> BlockingRpcChannelImplementation
> 3. In BlockingRpcChannelImplementation(), 
>  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
> meet a  temporary dns failure then the "address" in isa will be null.
> 4. then we launch the real rpc call, the following stack is:
> Caused by: java.net.UnknownHostException: unknown host: xxx.host2
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> Besides, i noticed there is a protection in RpcClient:
> if (remoteId.getAddress().isUnresolved()) {
> throw new UnknownHostException("unknown host: " + 
> remoteId.getAddress().getHostName());
>   }
> shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-30 Thread Kurt Young (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609468#comment-14609468
 ] 

Kurt Young commented on HBASE-13960:


sorry for the typo, path - patch

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young
 Attachments: 1.patch, HBASE-13960-v1.patch, HBASE-13960-v1.patch-0.98


 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: xxx.host2
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609484#comment-14609484
 ] 

Ted Yu commented on HBASE-13960:


Please attach patch for master branch.

Patch for 0.98 should be named HBASE-13960-0.98-v1.patch - QA bot doesn't 
recognize extension of patch-0.98

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young
 Attachments: 1.patch, HBASE-13960-v1.patch, HBASE-13960-v1.patch-0.98


 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: xxx.host2
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609154#comment-14609154
 ] 

Ted Yu commented on HBASE-13960:


The patch doesn't apply on master branch.

For 0.98, patch filename should contain '-0.98'

FYI

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young
 Attachments: 1.patch, HBASE-13960-v1.patch


 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: xxx.host2
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603618#comment-14603618
 ] 

Andrew Purtell commented on HBASE-13960:


Would you be interested in trying your hand at a patch [~ykt836] ? Not meant to 
be pressure if you can't. 

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young

 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: 
 r101072047.sqa.zmf.tbsite.net
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599445#comment-14599445
 ] 

stack commented on HBASE-13960:
---

Yes. What would you suggest [~ykt836] ? Regetting the stub is a bit tough. We 
should probe to make sure the ISA is resolved before we finish the stub setup?  
Thanks.

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young

 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: 
 r101072047.sqa.zmf.tbsite.net
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException

2015-06-24 Thread Kurt Young (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600490#comment-14600490
 ] 

Kurt Young commented on HBASE-13960:


In RpcClient::createBlockingRpcChannel, when new 
BlockingRpcChannelImplementation(), check the isa and throw IOException if 
error occurred, exception thrown through  HConnectionImplementation::getClient, 
and let RegionServerCallable::prepare fail, client we try again later
i think this maybe enough, but i haven't check all the details of callers who 
called RpcClient::createBlockingRpcChannel

 HConnection stuck with UnknownHostException 
 

 Key: HBASE-13960
 URL: https://issues.apache.org/jira/browse/HBASE-13960
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.8
Reporter: Kurt Young

 when put/get from hbase, if we meet a temporary dns failure causes resolve 
 RS's host, the error will never recovered. put/get will failed with 
 UnknownHostException forever. 
 I checked the code, and the reason maybe:
 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a  
 ClientService.BlockingInterface stub from Hconnection
 2. In HConnectionImplementation::getClient, it caches the stub with a 
 BlockingRpcChannelImplementation
 3. In BlockingRpcChannelImplementation(), 
  this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we 
 meet a  temporary dns failure then the address in isa will be null.
 4. then we launch the real rpc call, the following stack is:
 Caused by: java.net.UnknownHostException: unknown host: 
 r101072047.sqa.zmf.tbsite.net
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
 Besides, i noticed there is a protection in RpcClient:
 if (remoteId.getAddress().isUnresolved()) {
 throw new UnknownHostException(unknown host:  + 
 remoteId.getAddress().getHostName());
   }
 shouldn't we do something when this situation occurred? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)