Adar Dembo has posted comments on this change.
Change subject: KUDU-1363: Add in-list predicates for extracting a set of
Patch Set 7:
> The main reason for not using unordered_set was because the
> collection would still be storing pointers instead of actual values
> and I did not think I could get an O(1) lookup when i am searching
> for a pointer-to-a-value in a set-of-pointers. PredicateData doesnt
> know about the the type of the value and defers the comparison to
> TypeInfo. As a result, PredicateData only holds void* or a
> collection(vector) of void*.
> The other (not so critical) reason I went with a vector vs an
> unordered_set was because I do care about the order of the list
> (which is sorted/deduped) when I push Upper/Lower bound key
> predicates in key_util. I suppose, I could sort and pick the
> min/max values from the collection in key_util itself instead of
> using a vector. However, using a vector just made it convenient.
Your second point is a valid one; std::unordered_set would be inappropriate if
you need to find lower/upper bounds often (I think this happens in the
predicate pushdown code that I didn't review). But what about std::set? It's
implemented by a red-black tree and so it'd still be more efficient than
As to your first point, whichever container you use, you can pass a custom hash
and equal_to (for unordered_set) or comparator (for set) that workers properly
for KuduValue pointers. The implementation would dereference the pointer,
dereference the private Data pointer inside it, and, using the definition of
KuduValue::Data in value-internal.h, operate on the actual data.
To view, visit http://gerrit.cloudera.org:8080/2986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Sameer Abhyankar <abhyan...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sameer Abhyankar <abhyan...@gmail.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>