[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-1226:
---
Labels: ipv6  (was: )

> Inconsistent hostname leads to low data locality on IPv6 hosts
> --
>
> Key: YARN-1226
> URL: https://issues.apache.org/jira/browse/YARN-1226
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
> Environment: Linux, IPv6
>Reporter: Kaibo Zhou
>  Labels: ipv6
>
> When I run a mapreduce job which use TableInputFormat to scan a hbase table 
> on yarn cluser with 140+ nodes, I consistently get very low data locality 
> around 0~10%. 
> The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the 
> cluster with NodeManager, DataNode and HRegionServer run on the same node.
> The reason of low data locality is: most machines in the cluster uses IPV6, 
> few machines use IPV4. NodeManager use 
> "InetAddress.getLocalHost().getHostName()" to get the host name, but the 
> return result of this function depends on IPV4 or IPV6, see 
> ["InetAddress.getLocalHost().getHostName() returns 
> FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. 
> On machines with ipv4, NodeManager get hostName as: 
> search042097.sqa.cm4.site.net
> But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
> if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns 
> search042097.sqa.cm4.site.net.
> 
> For the mapred job which scan hbase table, the InputSplit contains node 
> locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. 
> search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames 
> are allocated by HMaster. HMaster communicate with RegionServers and get the 
> region server's host name use java NIO: 
> clientChannel.socket().getInetAddress().getHostName().
> Also see the startup log of region server:
> 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master 
> passed us hostname to use. Was=search042024.sqa.cm4, 
> Now=search042024.sqa.cm4.site.net
> 
> As you can see, most machines in the Yarn cluster with IPV6 get the short 
> hostname, but hbase always get the full hostname, so the Host cannot matched 
> (see RMContainerAllocator::assignToMap).This can lead to poor locality.
> After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data 
> locality in the cluster.
> Thanks,
> Kaibo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality on IPv6 hosts

2013-09-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-1226:
-

Environment: Linux, IPv6
Summary: Inconsistent hostname leads to low data locality on IPv6 hosts 
 (was: Inconsistent hostname leads to low data locality)

> Inconsistent hostname leads to low data locality on IPv6 hosts
> --
>
> Key: YARN-1226
> URL: https://issues.apache.org/jira/browse/YARN-1226
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
> Environment: Linux, IPv6
>Reporter: Kaibo Zhou
>
> When I run a mapreduce job which use TableInputFormat to scan a hbase table 
> on yarn cluser with 140+ nodes, I consistently get very low data locality 
> around 0~10%. 
> The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the 
> cluster with NodeManager, DataNode and HRegionServer run on the same node.
> The reason of low data locality is: most machines in the cluster uses IPV6, 
> few machines use IPV4. NodeManager use 
> "InetAddress.getLocalHost().getHostName()" to get the host name, but the 
> return result of this function depends on IPV4 or IPV6, see 
> ["InetAddress.getLocalHost().getHostName() returns 
> FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. 
> On machines with ipv4, NodeManager get hostName as: 
> search042097.sqa.cm4.site.net
> But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
> if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns 
> search042097.sqa.cm4.site.net.
> 
> For the mapred job which scan hbase table, the InputSplit contains node 
> locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. 
> search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames 
> are allocated by HMaster. HMaster communicate with RegionServers and get the 
> region server's host name use java NIO: 
> clientChannel.socket().getInetAddress().getHostName().
> Also see the startup log of region server:
> 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master 
> passed us hostname to use. Was=search042024.sqa.cm4, 
> Now=search042024.sqa.cm4.site.net
> 
> As you can see, most machines in the Yarn cluster with IPV6 get the short 
> hostname, but hbase always get the full hostname, so the Host cannot matched 
> (see RMContainerAllocator::assignToMap).This can lead to poor locality.
> After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data 
> locality in the cluster.
> Thanks,
> Kaibo

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1226) Inconsistent hostname leads to low data locality

2013-09-23 Thread Kaibo Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaibo Zhou updated YARN-1226:
-

Summary: Inconsistent hostname leads to low data locality  (was: 
Inconsistent hostname leads to poor data locality)

> Inconsistent hostname leads to low data locality
> 
>
> Key: YARN-1226
> URL: https://issues.apache.org/jira/browse/YARN-1226
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-beta
>Reporter: Kaibo Zhou
>
> When I run a mapreduce job which use TableInputFormat to scan a hbase table 
> on yarn cluser with 140+ nodes, I consistently get very low data locality 
> around 0~10%. 
> The scheduler is Capacity Scheduler. Hbase and hadoop are integrated in the 
> cluster with NodeManager, DataNode and HRegionServer run on the same node.
> The reason of low data locality is: most machines in the cluster uses IPV6, 
> few machines use IPV4. NodeManager use 
> "InetAddress.getLocalHost().getHostName()" to get the host name, but the 
> return result of this function depends on IPV4 or IPV6, see 
> ["InetAddress.getLocalHost().getHostName() returns 
> FQDN"|http://bugs.sun.com/view_bug.do?bug_id=7166687]. 
> On machines with ipv4, NodeManager get hostName as: 
> search042097.sqa.cm4.site.net
> But on machines with ipv6, NodeManager get hostName as: search042097.sqa.cm4
> if run with IPv6 disabled, -Djava.net.preferIPv4Stack=true, then returns 
> search042097.sqa.cm4.site.net.
> 
> For the mapred job which scan hbase table, the InputSplit contains node 
> locations of [FQDN|http://en.wikipedia.org/wiki/FQDN], e.g. 
> search042097.sqa.cm4.site.net. Because in hbase, the RegionServers' hostnames 
> are allocated by HMaster. HMaster communicate with RegionServers and get the 
> region server's host name use java NIO: 
> clientChannel.socket().getInetAddress().getHostName().
> Also see the startup log of region server:
> 13:06:21,200 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master 
> passed us hostname to use. Was=search042024.sqa.cm4, 
> Now=search042024.sqa.cm4.site.net
> 
> As you can see, most machines in the Yarn cluster with IPV6 get the short 
> hostname, but hbase always get the full hostname, so the Host cannot matched 
> (see RMContainerAllocator::assignToMap).This can lead to poor locality.
> After I use java.net.preferIPv4Stack to force IPv4 in yarn, I get 70+% data 
> locality in the cluster.
> Thanks,
> Kaibo

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira