Amogh Margoor has uploaded this change for review. (
http://gerrit.cloudera.org:8080/17592
Change subject: WIP: Reducing HashTable size by packing it's buckets
efficiently.
......................................................................
WIP: Reducing HashTable size by packing it's buckets efficiently.
HashTable implementation comprises of contiguos
array of Buckets and each Bucket would either
have Data or will point to linked list of
duplicate entries named DuplicateNode. These
are the structures of Bucket and DuplicateNode:
struct DuplicateNode {
bool matched;
DuplicateNode* next;
HtData htdata;
};
struct Bucket {
bool filled;
bool matched;
bool hasDuplicates;
uint32_t hash;
union {
HtData htdata;
DuplicateNode* duplicates;
} bucketData;
};
Size of Bucket is currently 16 bytes and Size of Duplicate Node is
24 bytes. If we can remove the booleans from both struct size of
Bucket would reduce to 12 bytes and DuplicateNode will be 16 bytes.
One of the ways we can remove booleans is to fold it into pointers
already part of struct. Pointers store addresses and on
architectures like x86 and ARM the linear address is only 48 bits
long. With level 5 paging intel is planning to expand it to 57-bit
long which means we can use most significant 7 bits i.e., 58 to 64
bits to store these booleans. This patch reduces the size of Bucket
and DuplicateNode by implementing this folding.
New Classes:
------------
As a part of patch, TaggedPointer is introduced which is template
class to store pointers and tag together in 64 bit integer. This
structure contains the ownership of the pointer and will take care
of allocation and deallocation the the object being pointed to.
However derived classes can opt out of the ownership of the object
and let the client manage it. It's derived classes for Bucket and
DuplicateNode do the same. These classes are TaggedBucketData and
TaggedDuplicateNode.
Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
---
M be/src/exec/grouping-aggregator.h
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/runtime/buffered-tuple-stream.h
M be/src/util/CMakeLists.txt
A be/src/util/tagged-ptr-test.cc
A be/src/util/tagged-ptr.h
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
9 files changed, 452 insertions(+), 123 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/17592/1
--
To view, visit http://gerrit.cloudera.org:8080/17592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Gerrit-Change-Number: 17592
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor <[email protected]>