Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11722 )

Change subject: KUDU-2038: Support bitmap indexing
......................................................................


Patch Set 1:

(6 comments)

Didn't look through the code much yet, just left some comments on the commit 
message. For a feature of this size it might be nice to have a design document 
written out explaining the design and various trade-offs, along with some 
initial benchmarks showing the performance, memory usage on the write path, and 
disk space used by the indexes.

http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG@10
PS1, Line 10: has one bitmap info. One bitmap info is composed of two CFiles, 
one
            : for keys and the other for bitmap data. The bitmap info is 
generated
            : d
Are the bitmap info files generated every time the main column cfile is 
written? If so, why not just append it to that same cfile? We have some 
scalability limits on number of blocks, so consolidaitng these together into a 
single cfile (block) could potentially help alleviate increasing pressure.

(if that ends up being too difficult because they need separate footers, etc, 
that's OK)


http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG@14
PS1, Line 14: The index could be created when creating or altering table, and
            : it could be dropped later.
if it's created by altering the table, does that eagerly index all of the 
columns? If so, how?

Also, what are the memory effects of this feature on the write path? It seems 
like you'd need to buffer the bitmaps for each value in memory as you go, and 
write them all out at the end of the DRS flush. For high cardinality high 
entropy data this can be pretty substantial. Is there an upper bound on memory 
usage? Is there a cardinality at which the indexes become ineffective?


http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG@18
PS1, Line 18: would
nit: the commit message uses the word "would" which gives the sense that this 
is a possibility, not what is implemented. Are all of these features 
implemented in the current patch?


http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG@18
PS1, Line 18: invalidate
does the invalidation need to invalidate the index for the whole DRS? Could we 
not just invalidate it on a more fine-grained basis? (eg per batch?)


http://gerrit.cloudera.org:8080/#/c/11722/1//COMMIT_MSG@23
PS1, Line 23: The bitmap index supports Equality, InList, Range predicates.
do you have any benchmark info to share on this implementation, both on the 
write side and the read side? Would be great to see how this improves some 
benchmark like TPCDS or TPCH


http://gerrit.cloudera.org:8080/#/c/11722/1/build-support/iwyu.py
File build-support/iwyu.py:

http://gerrit.cloudera.org:8080/#/c/11722/1/build-support/iwyu.py@79
PS1, Line 79:   "src/kudu/tablet/cfile_set.cc",
why did you have to exclude this? can we fix the IWYU issue instead?



--
To view, visit http://gerrit.cloudera.org:8080/11722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0edaa0ef1dba2dbce85ebf15f0a731e4939a7860
Gerrit-Change-Number: 11722
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:19:54 +0000
Gerrit-HasComments: Yes

Reply via email to