Hi everyone, I'm running the DFSIO benchmark on my Hadoop cluster. This is a set of 8 nodes, one hosting the NameNode/ResourceManager, the other 7 each hosting a DataNode/NodeManager. I'm using the write benchmark, with 7 files of 128M each, and I have configured the blocksize to be 128M as well. I'm using version 2.7.1 with some minor debug log outputs. I was sort of expecting each NameNode to be assigned to one file, however what I see from the RMContainerAllocator is the following output:
2016-01-04 14:23:55,802 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on host match <host4> 2016-01-04 14:23:55,803 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on host match <host6> 2016-01-04 14:23:55,803 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on host match <host6> 2016-01-04 14:23:55,803 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on host match <host6> 2016-01-04 14:23:55,803 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on host match <host2> 2016-01-04 14:23:55,806 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on rack match /default-rack 2016-01-04 14:23:55,806 DEBUG [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned based on rack match /default-rack My hosts are named <host0> through <host7>. When running the subsequent read benchmark, a different set of hosts is chosen, where I expected it to be exactly the ones chosen for the write benchmark. Is there anything I'm fundamentally missing here? I'm happy to provide additional information if necessary. Thanks a lot in advance! Robert -- My GPG Key ID: 336E2680
