interpretation of Spark logs about remote fetch latency

ricky l Thu, 12 Oct 2017 19:43:11 -0700

Hello Spark users,

I have an inquiry while analyzing a sample Spark task. The task has remote
fetches (shuffle) from few blocks. However, the remote fetch time does not
really make sense to me. Can someone please help to interpret this?


The logs came from Spark REST API. The task ID 33 needs four blocks, and it
has to fetch three blocks from remote machines. In the "shuffleReadMetrics"
section, however, it marks as the "fetchWaitTime" as 0 while it really
fetches about 2.4GB from remote machines.
While in task ID 34 below, it needs to fetch 4 blocks with total size of
around 3GB, it shows the fetchWaitTime is about 2.4 seconds, and only this
makes sense.

Is this an intended behavior?

    "33" : {
      "taskId" : 33,
       ....
      "taskMetrics" : {
        ....
        "shuffleReadMetrics" : {
          *"remoteBlocksFetched" : 3,*
          "localBlocksFetched" : 1,
          *"fetchWaitTime" : 0,*
          *"remoteBytesRead" : 2401539138,*
          "localBytesRead" : 800513041,
          "recordsRead" : 4
        },
      }
    },
    "34" : {
      "taskId" : 34,
      ....
      "taskMetrics" : {
        ....
        "shuffleReadMetrics" : {
          *"remoteBlocksFetched" : 4,*
          "localBlocksFetched" : 0,
          *"fetchWaitTime" : 2416,*
          *"remoteBytesRead" : 3202052194,*
          "localBytesRead" : 0,
          "recordsRead" : 4
        },
      }
    },

interpretation of Spark logs about remote fetch latency

Reply via email to