Pooja Nilangekar has uploaded a new patch set (#3). (
http://gerrit.cloudera.org:8080/11517 )
Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with
many files
......................................................................
[WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
This change addresses the slow scans of sequence datasets with
many files by enqueueing the scan ranges to the head of the disk
IO queue instead of the tail. This ensures that the data ranges
get priority over headers of other files. Hence it produces
results earlier for limit queries [and dynamic filters?].
Testing:
Added logs to verify that the scan ranges for sequence files are
added to the head.
TODO: Verify that this patch solves the issue. [This can't be
tested on the minicluster]
Tested the patch with backend and end-to-end tests.
Single node performance test results:
+----------+--------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+----------+--------------------+---------+------------+------------+----------------+
| TPCH(50) | avro / none / none | 65.62 | -0.38% | 43.51 | -0.79%
|
+----------+--------------------+---------+------------+------------+----------------+
Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/util/internal-queue-test.cc
M be/src/util/internal-queue.h
13 files changed, 159 insertions(+), 115 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/11517/3
--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>