[jira] [Created] (HDDS-3794) Topology Aware read does not work correctly in XceiverClientGrpc

Stephen O'Donnell (Jira) Fri, 12 Jun 2020 08:36:46 -0700

Stephen O'Donnell created HDDS-3794:
---------------------------------------


             Summary: Topology Aware read does not work correctly in 
XceiverClientGrpc
                 Key: HDDS-3794
                 URL: https://issues.apache.org/jira/browse/HDDS-3794
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell
             Fix For: 0.6.0


In XceiverClientGrpc.java, the calls to read a block or chunks for a Datanode 
end up in the private method sendCommandWithRetry(). In this method it decides 
which datanode it should send the request to. To do that, it checks if there is 
a cached DN connection for the given block and if so it uses that. If there is 
no cached connection, it should take network topology into account or shuffle 
the nodes:

{code}
   List<DatanodeDetails> datanodeList = null;

    DatanodeBlockID blockID = null;
    if (request.getCmdType() == ContainerProtos.Type.ReadChunk) {
      blockID = request.getReadChunk().getBlockID();
    } else if (request.getCmdType() == ContainerProtos.Type.GetSmallFile) {
      blockID = request.getGetSmallFile().getBlock().getBlockID();
    }

    if (blockID != null) {
      LOG.info("blockid is not null");
      // Check if the DN to which the GetBlock command was sent has been cached.
      DatanodeDetails cachedDN = getBlockDNcache.get(blockID);
      if (cachedDN != null) {
        LOG.info("Cached DN is not null");
        datanodeList = pipeline.getNodes();
        int getBlockDNCacheIndex = datanodeList.indexOf(cachedDN);
        if (getBlockDNCacheIndex > 0) {
          LOG.info("pulling cached dn to top of list");
          // Pull the Cached DN to the top of the DN list
          Collections.swap(datanodeList, 0, getBlockDNCacheIndex);
        }
      } else if (topologyAwareRead) {
        LOG.info("topology aware - order DNs");
        datanodeList = pipeline.getNodesInOrder();
      }
    }
    if (datanodeList == null) {
      LOG.info("List is null - shuffling");
      datanodeList = pipeline.getNodes();
      // Shuffle datanode list so that clients do not read in the same order
      // every time.
      Collections.shuffle(datanodeList);
    }
    <call to DN after here>
{code}

The normal flow for the client is to first make a getBlock() call to the DN and 
then a readChunk() call.

Due to the logic at the top of the block above, blockID is always going to be 
null for the getBlock() call, then it never checks the topologyAwareRead 
section and shuffles the node.

Then for readChunk, it will find the blockID, find a cached DN, which was the 
result of the shuffle, and then it reuses that DN.

Therefore the topologyAwareRead does not work as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDDS-3794) Topology Aware read does not work correctly in XceiverClientGrpc

Reply via email to