Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with 
many files
......................................................................


Patch Set 3:

> > This can't be tested on hdfs since there are no "remote" blocks
 > in
 > > the minicluster. So all the scan ranges of a file are added to
 > the
 > > appropriate local disk queue once the header is processed.
 >
 > This came up in a conversation between me and Joe today as well.
 > Replication in HDFS is per file, so we should be able to "hdfs put"
 > with appropriate options to induce a remote block, even in the
 > minicluster. Unfortunately, it doesn't seem to work with the
 > following sequence:
 >
 > $ impala-shell.sh -q 'create table t (x string)'
 > $ yes | head > /tmp/f
 > $ hadoop fs -D dfs.replication=1 -put /tmp/f /test-warehouse/t
 > $ impala-shell.sh -i localhost:21002 -q 'set num_nodes=1;
 > invalidate metadata t; select * from t limit 2; profile' | grep -i
 > BytesReadShortCircuit
 >
 > Impala seems to be doing short-circuit-read on all the impalad's
 > (presumably because the datanode somewhat reasonably decides things
 > are indeed local).
 >
 > Anyway--this surprised me so I figured I'd mention it.

The scheduler treats all backend *hosts* the same. In particular all reads on 
the minicluster will be assigned to "localhost". Then we have some special 
handling for backend hosts that have multiple impalads running: we assign scan 
ranges round-robin without considering the actual size of each scan range 
(scheduler.cc:890). This is only used during testing and not supported on 
production deployments so we don't try to be very sophisticated.

On second thought I hoped we might be able to provoke a remote read if we add 
multiple files that only reside on the first data node and then scan all of 
them. I tried this and it didn't work. The HDFS file browser shows "localhost" 
as the location of each file, making me think that it does not make a 
distinction between each datanode and instead figures out how to perform a 
short circuit read directly.

The scheduler itself makes the right assignments:

I1029 21:19:09.685812 24983 scheduler.cc:995] ScanRangeAssignment: 
server=TNetworkAddress {
  01: hostname (string) = "lv-desktop",
  02: port (i32) = 22000,
}
I1029 21:19:09.685825 24983 scheduler.cc:1001] node_id=0 
ranges=TScanRangeParams {
  01: scan_range (struct) = TScanRange {
    01: hdfs_file_split (struct) = THdfsFileSplit {
      01: file_name (string) = "f3",
      02: offset (i64) = 0,
      03: length (i64) = 20,
      04: partition_id (i64) = 0,
      05: file_length (i64) = 20,
      06: file_compression (i32) = 0,
      07: mtime (i64) = 1540872321692,
    },
  },
  02: volume_id (i32) = 2,
  03: is_cached (bool) = false,
  04: is_remote (bool) = false,
}
I1029 21:19:09.685854 24983 scheduler.cc:995] ScanRangeAssignment: 
server=TNetworkAddress {
  01: hostname (string) = "lv-desktop",
  02: port (i32) = 22002,
}
I1029 21:19:09.685863 24983 scheduler.cc:1001] node_id=0 
ranges=TScanRangeParams {
  01: scan_range (struct) = TScanRange {
    01: hdfs_file_split (struct) = THdfsFileSplit {
      01: file_name (string) = "f2",
      02: offset (i64) = 0,
      03: length (i64) = 20,
      04: partition_id (i64) = 0,
      05: file_length (i64) = 20,
      06: file_compression (i32) = 0,
      07: mtime (i64) = 1540872318716,
    },
  },
  02: volume_id (i32) = 1,
  03: is_cached (bool) = false,
  04: is_remote (bool) = false,
}
I1029 21:19:09.685868 24983 scheduler.cc:995] ScanRangeAssignment: 
server=TNetworkAddress {
  01: hostname (string) = "lv-desktop",
  02: port (i32) = 22001,
}
I1029 21:19:09.685875 24983 scheduler.cc:1001] node_id=0 
ranges=TScanRangeParams {
  01: scan_range (struct) = TScanRange {
    01: hdfs_file_split (struct) = THdfsFileSplit {
      01: file_name (string) = "f",
      02: offset (i64) = 0,
      03: length (i64) = 20,
      04: partition_id (i64) = 0,
      05: file_length (i64) = 20,
      06: file_compression (i32) = 0,
      07: mtime (i64) = 1540872285223,
    },
  },
  02: volume_id (i32) = 0,
  03: is_cached (bool) = false,
  04: is_remote (bool) = false,
}


--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 30 Oct 2018 04:22:08 +0000
Gerrit-HasComments: No

Reply via email to