[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1226: --- Labels: ipv6 (was: ) > Inconsistent hostname leads to low data locality on IPv6 hosts > -- > > Key: YARN-1226 > URL: https://issues.apache.org/jira/browse/YARN-1226 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta > Environment: Linux, IPv6 >Reporter: Kaibo Zhou > Labels: ipv6 > > When I run a mapreduce job which use TableInputFormat to scan a hbase table > on yarn cluser with 140+ nodes, I consistently get very low data locality > around 0~10%. > The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the > cluster with NodeManager, DataNode and HRegionServer run on the same node. > The reason of low data locality is: most machines in the cluster uses IPV6, > few machines use IPV4. NodeManager use > "InetAddress.getLocalHost().getHostName()" to get the host name, but the > return result of this function depends on IPV4 or IPV6, see > ["InetAddress.getLocalHost().getHostName() returns > FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. > On machines with ipv4, NodeManager get hostName as: > search042097.sqa.cm4.site.net > But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 > if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns > search042097.sqa.cm4.site.net. > > For the mapred job which scan hbase table, the InputSplit contains node > locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. > search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames > are allocated by HMaster. HMaster communicate with RegionServers and get the > region server's host name use java NIO: > clientChannel.socket().getInetAddress().getHostName(). > Also see the startup log of region server: > 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master > passed us hostname to use. Was=search042024.sqa.cm4, > Now=search042024.sqa.cm4.site.net > > As you can see, most machines in the Yarn cluster with IPV6 get the short > hostname, but hbase always get the full hostname, so the Host cannot matched > (see RMContainerAllocator::assignToMap).This can lead to poor locality. > After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data > locality in the cluster. > Thanks, > Kaibo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-1226: - Environment: Linux, IPv6 Summary: Inconsistent hostname leads to low data locality on IPv6 hosts (was: Inconsistent hostname leads to low data locality) > Inconsistent hostname leads to low data locality on IPv6 hosts > -- > > Key: YARN-1226 > URL: https://issues.apache.org/jira/browse/YARN-1226 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta > Environment: Linux, IPv6 >Reporter: Kaibo Zhou > > When I run a mapreduce job which use TableInputFormat to scan a hbase table > on yarn cluser with 140+ nodes, I consistently get very low data locality > around 0~10%. > The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the > cluster with NodeManager, DataNode and HRegionServer run on the same node. > The reason of low data locality is: most machines in the cluster uses IPV6, > few machines use IPV4. NodeManager use > "InetAddress.getLocalHost().getHostName()" to get the host name, but the > return result of this function depends on IPV4 or IPV6, see > ["InetAddress.getLocalHost().getHostName() returns > FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. > On machines with ipv4, NodeManager get hostName as: > search042097.sqa.cm4.site.net > But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 > if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns > search042097.sqa.cm4.site.net. > > For the mapred job which scan hbase table, the InputSplit contains node > locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. > search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames > are allocated by HMaster. HMaster communicate with RegionServers and get the > region server's host name use java NIO: > clientChannel.socket().getInetAddress().getHostName(). > Also see the startup log of region server: > 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master > passed us hostname to use. Was=search042024.sqa.cm4, > Now=search042024.sqa.cm4.site.net > > As you can see, most machines in the Yarn cluster with IPV6 get the short > hostname, but hbase always get the full hostname, so the Host cannot matched > (see RMContainerAllocator::assignToMap).This can lead to poor locality. > After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data > locality in the cluster. > Thanks, > Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality
[ https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaibo Zhou updated YARN-1226: - Summary: Inconsistent hostname leads to low data locality (was: Inconsistent hostname leads to poor data locality) > Inconsistent hostname leads to low data locality > > > Key: YARN-1226 > URL: https://issues.apache.org/jira/browse/YARN-1226 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta >Reporter: Kaibo Zhou > > When I run a mapreduce job which use TableInputFormat to scan a hbase table > on yarn cluser with 140+ nodes, I consistently get very low data locality > around 0~10%. > The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the > cluster with NodeManager, DataNode and HRegionServer run on the same node. > The reason of low data locality is: most machines in the cluster uses IPV6, > few machines use IPV4. NodeManager use > "InetAddress.getLocalHost().getHostName()" to get the host name, but the > return result of this function depends on IPV4 or IPV6, see > ["InetAddress.getLocalHost().getHostName() returns > FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. > On machines with ipv4, NodeManager get hostName as: > search042097.sqa.cm4.site.net > But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4 > if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns > search042097.sqa.cm4.site.net. > > For the mapred job which scan hbase table, the InputSplit contains node > locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. > search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames > are allocated by HMaster. HMaster communicate with RegionServers and get the > region server's host name use java NIO: > clientChannel.socket().getInetAddress().getHostName(). > Also see the startup log of region server: > 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master > passed us hostname to use. Was=search042024.sqa.cm4, > Now=search042024.sqa.cm4.site.net > > As you can see, most machines in the Yarn cluster with IPV6 get the short > hostname, but hbase always get the full hostname, so the Host cannot matched > (see RMContainerAllocator::assignToMap).This can lead to poor locality. > After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data > locality in the cluster. > Thanks, > Kaibo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira