Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12113 )

Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu 
tables.
......................................................................


Patch Set 11:

(9 comments)

This is looking pretty good, I am at a +1 once these comments are addressed. 
I'll let Thomas give the +2 since he built out this code originally.

http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/runtime/coordinator.cc@972
PS11, Line 972: TExprNode
I don't think this cast is necessary, is it?


http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.h@91
PS11, Line 91: , int precision
Maybe just pass in a const ColumnType&? That would be more generic. Feel free 
to ignore this one.


http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.h@94
PS11, Line 94:   static void Or(const TMinMaxFilter& in, TMinMaxFilter* out) {
This overload only exists as a convenience for test functions right? It's 
confusing because product code doesn't call it. I think it would be better to 
remove this overload and update the tests to pass in a dummy argument.


http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.h@247
PS11, Line 247: #define GET_MINMAX(MIN_OR_MAX)                              \
Let's stop this macro leaking out of the header. We use push_macro/pop_macro 
like below in some other places in the code.

  #pragma push_macro("GET_MINMAX")
  #define GET_MINMAX ....
  ....
  #pragma pop_macro("GET_MINMAX")


http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/12113/11/be/src/util/min-max-filter.cc@492
PS11, Line 492:   if (in.always_false)
nit: our style is to use braces around the if body (except if the whole if 
statement fits on a single line).


http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/data/README@293
PS11, Line 293: with decimal types on Kudu.
Can you mention how the data was generated, in case someone wants to reproduce 
something similar?


http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/datasets/functional/functional_schema_template.sql@1854
PS11, Line 1854: -- Small table with decimal values.  This is used for
I guess the handling of comments in these files is a bit broken - it's 
including the comment as part of the table name for some purposes, e.g. the 
below command wasn't working to load the table data:

./bin/load-data.py -w functional-query --table_names=decimal_rtf_tiny_tbl 
--table_formats=text/none,kudu/none -f

I could work around it by moving the comment to the DATASET section, although I 
don't think that fixes the underlying problem that comments aren't stripped. So 
maybe move the comment to be consistent with the rest of the file, but we can 
probably skip messing with the file parsing code as part of this patch.


http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test:

PS11:
As a hack to validate the results, I modified the test code to run the same 
tests against text, ignoring the RUNTIME_PROFILE sections:

diff --git a/tests/common/impala_test_suite.py 
b/tests/common/impala_test_suite.py
index a4db757..765365b 100644
--- a/tests/common/impala_test_suite.py
+++ b/tests/common/impala_test_suite.py
@@ -512,11 +512,12 @@ class ImpalaTestSuite(BaseTestSuite):
         rt_profile_info = 'RUNTIME_PROFILE'

       if rt_profile_info is not None:
-        rt_profile = verify_runtime_profile(test_section[rt_profile_info],
-                               result.runtime_profile,
-                               
update_section=pytest.config.option.update_results)
-        if pytest.config.option.update_results:
-          test_section[rt_profile_info] = "".join(rt_profile)
+        #rt_profile = verify_runtime_profile(test_section[rt_profile_info],
+        #                       result.runtime_profile,
+        #                       
update_section=pytest.config.option.update_results)
+        #if pytest.config.option.update_results:
+        #  test_section[rt_profile_info] = "".join(rt_profile)
+        pass

       if 'DML_RESULTS' in test_section:
         assert 'ERRORS' not in test_section
diff --git a/tests/query_test/test_runtime_filters.py 
b/tests/query_test/test_runtime_filters.py
index f769224..f331da4 100644
--- a/tests/query_test/test_runtime_filters.py
+++ b/tests/query_test/test_runtime_filters.py
@@ -107,7 +107,7 @@ class TestMinMaxFilters(ImpalaTestSuite):
     super(TestMinMaxFilters, cls).add_test_dimensions()
     # Min-max filters are only implemented for Kudu.
     cls.ImpalaTestMatrix.add_constraint(
-        lambda v: v.get_value('table_format').file_format in ['kudu'])
+        lambda v: v.get_value('table_format').file_format in ['text'])

   def test_min_max_filters(self, vector):
     self.run_test_case('QueryTest/min_max_filters', vector)


http://gerrit.cloudera.org:8080/#/c/12113/11/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@2789
PS11, Line 2789: # Test case 9.1: DecimalMinMaxFilters with left semi join - 4 
bytes
I think it would be good to break out this sections into separate .test files 
called from different python functions - that would let them run in parallel 
and make it a bit easier to navigate the tests.



--
To view, visit http://gerrit.cloudera.org:8080/12113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Gerrit-Change-Number: 12113
Gerrit-PatchSet: 11
Gerrit-Owner: Janaki Lahorani <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 26 Dec 2018 16:02:29 +0000
Gerrit-HasComments: Yes

Reply via email to