Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )
Change subject: IMPALA-5789: Add always_false flag in bloom filter ...................................................................... Patch Set 1: (2 comments) Will wait for the new PS but had a couple of high-level bits of feedback first. It may be worth looking at Thomas' patches if you haven't already to make sure your changes are compatible. http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc@104 PS1, Line 104: if (FilterContext::HasAlwaysFalseInList(FilterStats::SPLITS_KEY, Not sure if this is relevant but there is a known problem with filtering out splits with some scanners: IMPALA-3804 http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h@106 PS1, Line 106: if (always_false_) return 0; I think this change will mess up some accounting in RuntimeFilterBank. My preference is to continue to allocate the memory when the BloomFilter is constructed instead of allocating it lazily. It would waste memory in the always_false case but I think we can live with that because it doesn't increase worst-case memory consumption. -- To view, visit http://gerrit.cloudera.org:8080/8170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d Gerrit-Change-Number: 8170 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 03 Oct 2017 23:32:29 +0000 Gerrit-HasComments: Yes