Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18219
Change subject: WIP IMPALA-11064 Optimizing Temporary File Structure for Batch Reading ...................................................................... WIP IMPALA-11064 Optimizing Temporary File Structure for Batch Reading This patch optimizes the structure of temporary files to improve the batch reading performance, which is a follow up of IMPALA-10791. There will be two types of structures, one is the original, allocate the space for a new page from the last file allocated, when the file is full, we will create a new file and allocate the space from it. The other is the new structure, which contains multiple blocks, the data in each block belongs to a same spill id of the page, once a block is full, we firstly try to allocate a block in the same file, if the file is full, we will try to allocate the block from a new file. The new structure benefits the batch reading by gathering the data with the same spill id in the same block, therefore benefits the case when doing sequential read on the same spill id. To use the new structure more efficiently, we also have two optimizations. 1. The batch reading is only for partitioned hash join node. Because the way to pin the data back to the memory of partitioned hash join node is sequential, using this limitation would save the memory usage while the query may have a lot of spilling on the grouping aggregation nodes and the reads from these data are quite random. 2. Prefetch the block to be read. When pinning a page from a file using batch reading, we will try to prefetch a block several steps away (step number is configurable). Since we limit the batch reading for sequential reads only, a prefetch for the block can accelerate the reading rate. New start option: 'remote_batch_read_max_block_size_level' Default value of the option is 3, which stands for the maximum block size is 2^3=8MB. New query options: 'remote_batch_read_prefetch_step' The option specifies the step number for prefetch. For example, if step number is 1, and we are trying to read the data from a block with spill id 0 and sequence number 0, then we will try to prefetch a block with spill id 0 and sequence number 1. 'spill_hash_partition_level' The option decides how the spill id is formed. If the value is 1, so the spill id would be (partition id >> 1), therefore, partition 0 and partition 1 will share with the same spill id. In practice, a high value of this option would require more memory when doing the batch reading, a small value may lead to more unuploaded files because of more spill ids and more half-filled blocks, it may slow down the writing and cause uploading timeout by using up local disk buffer space. Testing: Ran core tests. Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018 --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/file-writer.h M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/query-state.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift 24 files changed, 879 insertions(+), 314 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/18219/1 -- To view, visit http://gerrit.cloudera.org:8080/18219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018 Gerrit-Change-Number: 18219 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu <[email protected]>
