[jira] [Created] (HBASE-14958) regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2, but now=192.168.3.114
Yong Zheng created HBASE-14958: -- Summary: regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2, but now=192.168.3.114 Key: HBASE-14958 URL: https://issues.apache.org/jira/browse/HBASE-14958 Project: HBase Issue Type: Bug Affects Versions: 1.1.2 Environment: physical machines: redhat7.1 docker version: 1.9.1 Reporter: Yong Zheng I have two physical machines: c3m3n03docker and c3m3n04docker. I started two docker instances per physical node. the topology is like: n03docker1(172.17.1.2) -\ | br0(172.17.1.1) + c3m3n03 n03docker2(172.17.1.3) -/ n04docker1(172.17.2.2) -\ | br0(172.17.2.1) + c3m3n04 n04docker2(172.17.2.3) -/ for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are connecting to the same switch. Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and forwarded to c3m3n03. n03docker1: hbase(1.1.2) master n03docker2: region server n04docker1: region server n04docker2: region server I first start the n03docker1 and n03docker2, it works; after that, I start n04docker1 and it will reported: 2015-12-09 08:01:58,259 ERROR [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2.gpfs.net, but now=192.168.3.114 on the master logs: 2015-12-09 08:11:12,234 INFO [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: Registering server=192.168.3.114,16020,144970721 So, you see, when hbase master receives the requests from n04docker1, all these requests are source NATed with 192.168.3.114(not 172.17.2.2). and hbase master passes 192.168.3.114 back to 172.17.2.2(n04docker1). Thus, n04docker1(172.17.2.2) reported exceptions in logs. hbase doesn't support running in virtualization cluster? because SNAT is widely used in virtualization. if hbase master get remote hostname/ip(thus get 192.168.3.114) and pass it back to region server, it will hit this issues. HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm taking hbase 1.1.2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14958) regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2, but now=192.168.3.114
[ https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zheng updated HBASE-14958: --- Description: I have two physical machines: c3m3n03docker and c3m3n04docker. I started two docker instances per physical node. the topology is like: n03docker1(172.17.1.2) -\ | br0(172.17.1.1) + c3m3n03 n03docker2(172.17.1.3) -/ n04docker1(172.17.2.2) -\ | br0(172.17.2.1) + c3m3n04 n04docker2(172.17.2.3) -/ for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are connecting to the same switch. Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and forwarded to c3m3n03. n03docker1: hbase(1.1.2) master n03docker2: region server n04docker1: region server n04docker2: region server I first start the n03docker1 and n03docker2, it works; after that, I start n04docker2 and it will reported: 2015-12-09 08:01:58,259 ERROR [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2.gpfs.net, but now=192.168.3.114 on the master logs: 2015-12-09 08:11:12,234 INFO [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: Registering server=192.168.3.114,16020,144970721 So, you see, when hbase master receives the requests from n04docker2, all these requests are source NATed with 192.168.3.114(not 172.17.2.3). and hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, n04docker1(172.17.2.3) reported exceptions in logs. hbase doesn't support running in virtualization cluster? because SNAT is widely used in virtualization. if hbase master get remote hostname/ip(thus get 192.168.3.114) and pass it back to region server, it will hit this issues. HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm taking hbase 1.1.2). was: I have two physical machines: c3m3n03docker and c3m3n04docker. I started two docker instances per physical node. the topology is like: n03docker1(172.17.1.2) -\ | br0(172.17.1.1) + c3m3n03 n03docker2(172.17.1.3) -/ n04docker1(172.17.2.2) -\ | br0(172.17.2.1) + c3m3n04 n04docker2(172.17.2.3) -/ for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are connecting to the same switch. Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and forwarded to c3m3n03. n03docker1: hbase(1.1.2) master n03docker2: region server n04docker1: region server n04docker2: region server I first start the n03docker1 and n03docker2, it works; after that, I start n04docker1 and it will reported: 2015-12-09 08:01:58,259 ERROR [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2.gpfs.net, but now=192.168.3.114 on the master logs: 2015-12-09 08:11:12,234 INFO [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: Registering server=192.168.3.114,16020,144970721 So, you see, when hbase master receives the requests from n04docker1, all these requests are source NATed with 192.168.3.114(not 172.17.2.2). and hbase master passes 192.168.3.114 back to 172.17.2.2(n04docker1). Thus, n04docker1(172.17.2.2) reported exceptions in logs. hbase doesn't support running in virtualization cluster? because SNAT is widely used in virtualization. if hbase master get remote hostname/ip(thus get 192.168.3.114) and pass it back to region server, it will hit this issues. HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm taking hbase 1.1.2). > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2, but now=192.168.3.114 > --- > > Key: HBASE-14958 > URL: https://issues.apache.org/jira/browse/HBASE-14958 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: physical machines: redhat7.1 > docker version: 1.9.1 >Reporter: Yong Zheng > > I have two physical machines: c3m3n03docker and c3m3n04docker. > I started two docker instances per physical node. the topology is like: > n03docker1(172.17.1.2) -\ >
[jira] [Commented] (HBASE-14958) regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2, but now=192.168.3.114
[ https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049753#comment-15049753 ] Yong Zheng commented on HBASE-14958: Thanks for Nick so prompt response. After checking the prerequisites, DNS can't solve the issue. in my virtualized hbase cluster, it has only 4 nodes: n03docker1(172.17.1.2) n03docker2(172.17.1.3) n04docker1(172.17.2.2) n04docker2(172.17.2.3) DNS is not configured but I configured /etc/hosts: # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.17.1.1 c3m3n03docker.gpfs.net c3m3n03docker<== the br0 on the physical node c3m3n03 172.17.2.1 c3m3n04docker.gpfs.net c3m3n04docker <== the br0 on the physical node c3m3n04 172.17.1.2 n03docker1.gpfs.net n03docker1 172.17.1.3 n03docker2.gpfs.net n03docker2 172.17.2.2 n04docker1.gpfs.net n04docker1 172.17.2.3 n04docker2.gpfs.net n04docker2 so, DNS resolution works(I do see the correct name for n03docker1 and n03docker2). However, for any region servers located over other physical machines, all network packet from those region servers will be source NATed with the IP of c3m3n04(192.168.3.114)(that means, all IP packet will be changed with the source IP as 192.168.3.114. so that these packets can be transferred to the physical node c3m3n03). for hbase master, 192.168.3.113 or 192.168.3.114 are invisible for hbase. thus, DNS resolution for 192.168.3.114 inside VM doesn't help this. e.g. 192.168.3.114's hostname should be c3m3n04, not n04docker1 or n04docker2. if we configure DNS inside VM to map 192.168.3.114 into n04docker1 or n04docker2, this will mess up IP-hostname inside VM. Also, if we map 192.168.3.114 into n04docker1, that means, we can't start the 2nd region server over the same physical node because they will be recognized as the physical node's IP address/hostname. > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2, but now=192.168.3.114 > --- > > Key: HBASE-14958 > URL: https://issues.apache.org/jira/browse/HBASE-14958 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: physical machines: redhat7.1 > docker version: 1.9.1 >Reporter: Yong Zheng > > I have two physical machines: c3m3n03docker and c3m3n04docker. > I started two docker instances per physical node. the topology is like: > n03docker1(172.17.1.2) -\ > | br0(172.17.1.1) + c3m3n03 > n03docker2(172.17.1.3) -/ > n04docker1(172.17.2.2) -\ > | br0(172.17.2.1) + c3m3n04 > n04docker2(172.17.2.3) -/ > for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 > with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter > enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are > connecting to the same switch. > Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, > all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and > forwarded to c3m3n03. > n03docker1: hbase(1.1.2) master > n03docker2: region server > n04docker1: region server > n04docker2: region server > I first start the n03docker1 and n03docker2, it works; after that, I start > n04docker2 and it will reported: > 2015-12-09 08:01:58,259 ERROR > [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2.gpfs.net, but now=192.168.3.114 > on the master logs: > 2015-12-09 08:11:12,234 INFO > [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: > Registering server=192.168.3.114,16020,144970721 > So, you see, when hbase master receives the requests from n04docker2, all > these requests are source NATed with 192.168.3.114(not 172.17.2.3). and > hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, > n04docker1(172.17.2.3) reported exceptions in logs. > hbase doesn't support running in virtualization cluster? because SNAT is > widely used in virtualization. if hbase master get remote hostname/ip(thus > get 192.168.3.114) and pass it back to region server, it will hit this issues. > HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm > taking hbase 1.1.2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14958) regionserver.HRegionServer: Master passed us a different hostname to use; was=n04docker2, but now=192.168.3.114
[ https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049930#comment-15049930 ] Yong Zheng commented on HBASE-14958: I did some simple test on n03docker2(172.17.1.3) with tcp_server; on n04docker2(172.17.2.3) with tcp_client. in tcp_server: ... sin_size=sizeof(struct sockaddr_in); if((new_fd=accept(sockfd,(struct sockaddr *)(_addr),_size)) == -1) { fprintf(stderr,"Accept error:%s\n\a",strerror(errno)); exit(1); } fprintf(stderr,"Server get connection from %x\n", client_addr.sin_addr.s_addr); ret = getpeername(sockfd, (struct sockaddr *)(_peer_addr), _size); ... on tcp_client, it just connects to the server and send one message. bash-4.1# hostname n04docker2 bash-4.1# ./tcp_client 172.17.1.3 8030 on tcp_server, bash-4.1# hostname n03docker2.gpfs.net bash-4.1# ./tcp_server 8030 will accepting... Server get connection ... Server get connection from 7203a8c0 <== this IP address is 192.168.3.114 after transforming to host address. So, in Source NAT-involved virtualization, it looks to me that the current hbase master/region server mechanism doesn't work. maybe, we could ask the region server/master to exchange the hostname,not depends on socket API to get the client IP address. > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2, but now=192.168.3.114 > --- > > Key: HBASE-14958 > URL: https://issues.apache.org/jira/browse/HBASE-14958 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: physical machines: redhat7.1 > docker version: 1.9.1 >Reporter: Yong Zheng > > I have two physical machines: c3m3n03docker and c3m3n04docker. > I started two docker instances per physical node. the topology is like: > n03docker1(172.17.1.2) -\ > | br0(172.17.1.1) + c3m3n03 > n03docker2(172.17.1.3) -/ > n04docker1(172.17.2.2) -\ > | br0(172.17.2.1) + c3m3n04 > n04docker2(172.17.2.3) -/ > for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 > with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter > enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are > connecting to the same switch. > Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, > all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and > forwarded to c3m3n03. > n03docker1: hbase(1.1.2) master > n03docker2: region server > n04docker1: region server > n04docker2: region server > I first start the n03docker1 and n03docker2, it works; after that, I start > n04docker2 and it will reported: > 2015-12-09 08:01:58,259 ERROR > [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2.gpfs.net, but now=192.168.3.114 > on the master logs: > 2015-12-09 08:11:12,234 INFO > [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: > Registering server=192.168.3.114,16020,144970721 > So, you see, when hbase master receives the requests from n04docker2, all > these requests are source NATed with 192.168.3.114(not 172.17.2.3). and > hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, > n04docker1(172.17.2.3) reported exceptions in logs. > hbase doesn't support running in virtualization cluster? because SNAT is > widely used in virtualization. if hbase master get remote hostname/ip(thus > get 192.168.3.114) and pass it back to region server, it will hit this issues. > HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm > taking hbase 1.1.2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)