Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18219


Change subject: WIP IMPALA-11064 Optimizing Temporary File Structure for Batch 
Reading
......................................................................

WIP IMPALA-11064 Optimizing Temporary File Structure for Batch Reading

This patch optimizes the structure of temporary files to improve the
batch reading performance, which is a follow up of IMPALA-10791.

There will be two types of structures, one is the original, allocate
the space for a new page from the last file allocated, when the file
is full, we will create a new file and allocate the space from it.

The other is the new structure, which contains multiple blocks, the
data in each block belongs to a same spill id of the page, once a
block is full, we firstly try to allocate a block in the same file,
if the file is full, we will try to allocate the block from a new
file. The new structure benefits the batch reading by gathering
the data with the same spill id in the same block, therefore
benefits the case when doing sequential read on the same spill id.

To use the new structure more efficiently, we also have two
optimizations.

1. The batch reading is only for partitioned hash join node.
Because the way to pin the data back to the memory of partitioned
hash join node is sequential, using this limitation would save the
memory usage while the query may have a lot of spilling on the
grouping aggregation nodes and the reads from these data are
quite random.

2. Prefetch the block to be read.
When pinning a page from a file using batch reading, we will try
to prefetch a block several steps away (step number is
configurable). Since we limit the batch reading for sequential
reads only, a prefetch for the block can accelerate the
reading rate.

New start option:
'remote_batch_read_max_block_size_level'
Default value of the option is 3, which stands for the maximum block
size is 2^3=8MB.

New query options:
'remote_batch_read_prefetch_step'
The option specifies the step number for prefetch. For example, if
step number is 1, and we are trying to read the data from a block
with spill id 0 and sequence number 0, then we will try to prefetch
a block with spill id 0 and sequence number 1.

'spill_hash_partition_level'
The option decides how the spill id is formed. If the value is 1,
so the spill id would be (partition id >> 1), therefore, partition 0
and partition 1 will share with the same spill id. In practice,
a high value of this option would require more memory when doing the
batch reading, a small value may lead to more unuploaded files
because of more spill ids and more half-filled blocks, it may slow
down the writing and cause uploading timeout by using up local
disk buffer space.

Testing:
Ran core tests.

Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/io/disk-file.cc
M be/src/runtime/io/disk-file.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/file-writer.h
M be/src/runtime/io/local-file-writer.cc
M be/src/runtime/io/local-file-writer.h
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/query-state.cc
M be/src/runtime/tmp-file-mgr-internal.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
24 files changed, 879 insertions(+), 314 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/18219/1
--
To view, visit http://gerrit.cloudera.org:8080/18219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018
Gerrit-Change-Number: 18219
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <[email protected]>

Reply via email to