[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307699#comment-15307699 ] Yu Li commented on HBASE-13960: --- This issue is resolved by HBASE-15856 > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young >Assignee: Yu Li > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127772#comment-15127772 ] Yu Li commented on HBASE-13960: --- ping... > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young >Assignee: Yu Li > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116898#comment-15116898 ] Yu Li commented on HBASE-13960: --- ping [~apurtell], waiting for your comments sir, thanks. > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young >Assignee: Yu Li > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112235#comment-15112235 ] Ted Yu commented on HBASE-13960: I am fine with latest patch. [~apurtell]: What do you think ? > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young >Assignee: Yu Li > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112021#comment-15112021 ] Yu Li commented on HBASE-13960: --- ping [~tedyu], are we good to go or any comments sir? Thanks. > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young >Assignee: Yu Li > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110015#comment-15110015 ] Yu Li commented on HBASE-13960: --- The last HadoopQA run looks good. The findbugs issue is not introduced by patch here and have logged HBASE-15148 to resolve it. > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-update.v2.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108229#comment-15108229 ] Hadoop QA commented on HBASE-13960: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 55s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 22m 5s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 106m 49s {color} | {color:green} hbase-server in the patch passed with JDK v1.8.0. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 107m 30s {color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 260m 51s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12783277/HBASE-13960-update.v2.patch | | JIRA Issue | HBASE-13960 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux asf909.gq1.ygridcore.net
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107823#comment-15107823 ] Hadoop QA commented on HBASE-13960: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 37s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 23s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 53s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 37s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 251m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0 Failed junit tests | hadoop.hbase.master.TestMasterNoCluster | | JDK v1.7.0_79 Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | | | hadoop.hbase.master.TestMasterNoCluster | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL |
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107970#comment-15107970 ] Yu Li commented on HBASE-13960: --- The findbugs issue is not introduced by this JIRA, but the ut failure is caused by the changes here. Will update the patch to fix it. > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-update.patch, > HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106845#comment-15106845 ] Yu Li commented on HBASE-13960: --- We encountered the same issue months ago in our 0.98.12 cluster, and fixed it with a similar solution. When trying to contribute our work back along with our upgrading to 1.x, found this one by searching. The patch here is a little stale that cannot compile with latest code base. After some minor change, it works and the UT design is good enough to produce the issue. UT could pass w/ patch (main changes on AbstractRpcClient), and fail with below exceptions w/o patch: If UnknownHostException thrown during locate region in meta {noformat} 2016-01-19 22:07:14,712 DEBUG [Time-limited test] client.ConnectionImplementation(944): locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=0 of 36 failed; retrying after sleep of 100 because: Failed after attempts=36, exceptions: Tue Jan 19 22:07:14 CST 2016, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=68243: row 'testRegionServerRandomRpcFail,row,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,55113,1453212381028, seqNum=0 {noformat} If thrown during puts: {noformat} 2016-01-19 22:04:25,685 INFO [hconnection-0x3eeadfbb-shared-pool14-t1] client.AsyncProcess$AsyncRequestFutureImpl(1222): #12, table=testRegionServerRandomRpcFail, attempt=24/36 failed=1ops, last exception: java.net.UnknownHostException: unknown host: random_invalid_host on localhost,54943,1453211970421, tracking started null, retrying after=20005ms, replay=1ops 2016-01-19 22:04:25,695 INFO [main] client.AsyncProcess(1711): #12, waiting for some tasks to finish. Expected max=0, tasksInProgress=24 2016-01-19 22:04:45,698 INFO [hconnection-0x3eeadfbb-shared-pool14-t1] client.AsyncProcess$AsyncRequestFutureImpl(1222): #12, table=testRegionServerRandomRpcFail, attempt=25/36 failed=1ops, last exception: java.net.UnknownHostException: unknown host: random_invalid_host on localhost,54943,1453211970421, tracking started null, retrying after=20176ms, replay=1ops 2016-01-19 22:04:45,698 INFO [main] client.AsyncProcess(1711): #12, waiting for some tasks to finish. Expected max=0, tasksInProgress=25 {noformat} Will upload the updated patch soon. > HConnection stuck with UnknownHostException > > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 0.98.8 >Reporter: Kurt Young > Attachments: HBASE-13960-0.98-v1.patch, HBASE-13960-v2.patch > > > when put/get from hbase, if we meet a temporary dns failure causes resolve > RS's host, the error will never recovered. put/get will failed with > UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a > ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a > BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we > meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at > org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + > remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609468#comment-14609468 ] Kurt Young commented on HBASE-13960: sorry for the typo, path - patch HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young Attachments: 1.patch, HBASE-13960-v1.patch, HBASE-13960-v1.patch-0.98 when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: xxx.host2 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609484#comment-14609484 ] Ted Yu commented on HBASE-13960: Please attach patch for master branch. Patch for 0.98 should be named HBASE-13960-0.98-v1.patch - QA bot doesn't recognize extension of patch-0.98 HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young Attachments: 1.patch, HBASE-13960-v1.patch, HBASE-13960-v1.patch-0.98 when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: xxx.host2 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609154#comment-14609154 ] Ted Yu commented on HBASE-13960: The patch doesn't apply on master branch. For 0.98, patch filename should contain '-0.98' FYI HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young Attachments: 1.patch, HBASE-13960-v1.patch when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: xxx.host2 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603618#comment-14603618 ] Andrew Purtell commented on HBASE-13960: Would you be interested in trying your hand at a patch [~ykt836] ? Not meant to be pressure if you can't. HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: r101072047.sqa.zmf.tbsite.net at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599445#comment-14599445 ] stack commented on HBASE-13960: --- Yes. What would you suggest [~ykt836] ? Regetting the stub is a bit tough. We should probe to make sure the ISA is resolved before we finish the stub setup? Thanks. HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: r101072047.sqa.zmf.tbsite.net at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600490#comment-14600490 ] Kurt Young commented on HBASE-13960: In RpcClient::createBlockingRpcChannel, when new BlockingRpcChannelImplementation(), check the isa and throw IOException if error occurred, exception thrown through HConnectionImplementation::getClient, and let RegionServerCallable::prepare fail, client we try again later i think this maybe enough, but i haven't check all the details of callers who called RpcClient::createBlockingRpcChannel HConnection stuck with UnknownHostException Key: HBASE-13960 URL: https://issues.apache.org/jira/browse/HBASE-13960 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.8 Reporter: Kurt Young when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. I checked the code, and the reason maybe: 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation 3. In BlockingRpcChannelImplementation(), this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the address in isa will be null. 4. then we launch the real rpc call, the following stack is: Caused by: java.net.UnknownHostException: unknown host: r101072047.sqa.zmf.tbsite.net at org.apache.hadoop.hbase.ipc.RpcClient$Connection.init(RpcClient.java:385) at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) Besides, i noticed there is a protection in RpcClient: if (remoteId.getAddress().isUnresolved()) { throw new UnknownHostException(unknown host: + remoteId.getAddress().getHostName()); } shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)