[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables

2017-11-09 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8506 )

Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
..


Patch Set 1:

GVO failure was unrelated.


-- 
To view, visit http://gerrit.cloudera.org:8080/8506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20
Gerrit-Change-Number: 8506
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 09 Nov 2017 20:40:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-09 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 14:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc@200
PS13, Line 200: if (!filter->GetCastIntMinMax(col_type, _min, 
_max)) {
> Unfortunate that that every Kudu client has to do this.
This is something that's been discussed before (see KUDU-931), I'm not sure why 
they decided not to do it.


http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@139
PS13, Line 139: case TYPE_BIGINT:
> using std::numeric_limits; at the top if this file
Done


http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@165
PS13, Line 165:
> Brief comment especially about the return value would be good.
Done


http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@604
PS13, Line 604: // must be a SlotRef pointing to a column. We can allow 
implicit integer casts
> typo: We can allow implicit integer casts
Done


http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@605
PS13, Line 605: // by casting the min/max values before sending them to 
Kudu.
> min/max values
Done


http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@98
PS13, Line 98: where a.tinyint_col = b.int_col and b.int_col in (0, 1)
> Let's make the min/max filter selective, e.g. by adding where b.int_col in
Done


http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@103
PS13, Line 103: # The min/max values in the filter are both above the range of 
the target col so all rows
> Let's also add a non-selective case where the min/max values fall outside t
Done



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 09 Nov 2017 18:53:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-09 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#14).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each hash join node
generates a bloom and min-max filter for each equi-join predicate, but
only those filters that end up being assigned to a target make it into
the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does not eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8506


Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
..

IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables

IMPALA-5546 added the ability to create unpartitioned Kudu tables, but
when SHOW CREATE TABLE is run on it still prints 'PARTITION BY' just
without a partition clause. This patch removes the 'PARTITION BY' from
the output.

Testing:
- Added test that runs SHOW CREATE on an unpartitioned Kudu table.

Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M tests/query_test/test_kudu.py
2 files changed, 13 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/8506/1
--
To view, visit http://gerrit.cloudera.org:8080/8506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20
Gerrit-Change-Number: 8506
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#13).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each hash join node
generates a bloom and min-max filter for each equi-join predicate, but
only those filters that end up being assigned to a target make it into
the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does not eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8464 )

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105
PS1, Line 105: consumed
> nit: "consumed" to be consistent with the memtracker terminology.
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106
PS1, Line 106: client_tracked
> Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and i
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52
PS1, Line 52:
> Was this flag documented? Just wondering if we should consider what happens
No, its not documented, and of course it specifically says that it may be 
changed/removed.

Certainly happy to consider other options if you think removing the flag is too 
disruptive, eg. the error buffer size could be calculated as the difference 
between "kudu_sink_mem_required" and "kudu_mutation_buffer_size", that just 
seemed a little complicated.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39
PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, 
DEFAULT_KUDU_MUTATION_BUFFER_SIZE,
> Just throwing out ideas here, but did we think about pros/cons of making th
I think that's a reasonable idea, though like you say these probably don't need 
to be modified very often.

At the least, I think it makes sense to get this in and file a JIRA for 
followup.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124
PS1, Line 124:   int64_t required_mem = FLAGS_kudu_mutation_buffer_size + 
error_buffer_size;
> Is this equivalent to the following?
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132
PS1, Line 132:   
state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), 
_));
> nit: long line.
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66
PS1, Line 66:   
@CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024")
> It might be faster to make this a regular query test but insert more data s
It takes a very large number of errors to hit the default limit (>10m "Key 
already present" errors), so I don't think it ends up being any faster.


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74
PS1, Line 74: presen
> present
Done



--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 08 Nov 2017 19:40:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8464

to look at the new patch set (#2).

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..

IMPALA-4591: Bound Kudu client error mem usage

Previously, Kudu client errors could grow in size unbounded,
potentially causing the process to be killed. This patch sets a
bound on the mem that can be used for these error messages, with
the size determined by the flag 'kudu_error_buffer_size'.

If the errors for a Kudu client exceed this size, the query will fail,
as some errors will be dropped and we won't be able to tell if all of
the errors can be safely ignored.

Testing:
- Added a custom cluster test that verifies that a query that exceeds
  the limit fails.

Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M tests/custom_cluster/test_kudu.py
3 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/2
--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission control resources"

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8499 )

Change subject: IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission 
control resources"
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8499
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3eec4b5a6ff350933ffda0bb80949c5960ecdf25
Gerrit-Change-Number: 8499
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 08 Nov 2017 19:04:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 1520b39

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8492


Change subject: Bump Kudu version to 1520b39
..

Bump Kudu version to 1520b39

Change-Id: Ib3d5d88bd4d3347447a64fac75ca3e0427b11ec6
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/8492/1
--
To view, visit http://gerrit.cloudera.org:8080/8492
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib3d5d88bd4d3347447a64fac75ca3e0427b11ec6
Gerrit-Change-Number: 8492
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[native-toolchain-CR] Bump Kudu version to 1520b39

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8491 )

Change subject: Bump Kudu version to 1520b39
..

Bump Kudu version to 1520b39

Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Thomas Tauber-Marshall: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
Gerrit-Change-Number: 8491
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[native-toolchain-CR] Bump Kudu version to 1520b39

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8491 )

Change subject: Bump Kudu version to 1520b39
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
Gerrit-Change-Number: 8491
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 Nov 2017 18:38:30 +
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to 1520b39

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8491 )

Change subject: Bump Kudu version to 1520b39
..


Patch Set 1:

http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/482/


--
To view, visit http://gerrit.cloudera.org:8080/8491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
Gerrit-Change-Number: 8491
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 07 Nov 2017 17:32:26 +
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to 1520b39

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8491


Change subject: Bump Kudu version to 1520b39
..

Bump Kudu version to 1520b39

Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/91/8491/1
--
To view, visit http://gerrit.cloudera.org:8080/8491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08
Gerrit-Change-Number: 8491
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8490 )

Change subject: IMPALA-2248: Make idle_session_timeout a query option
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@204
PS3, Line 204: if (iequals(key, "idle_session_timeout")) {
Its unfortunate this is special cased here. Could you add a comment explaining 
why that's necessary.


http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@213
PS3, Line 213: key, value,
 :   _->set_query_options,
single line


http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@189
PS3, Line 189: DEFINE_int32(idle_session_timeout, 0, "The time, in seconds, 
that a session may be idle"
Note how this interacts with the query option.


http://gerrit.cloudera.org:8080/#/c/8490/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/8490/3/common/thrift/ImpalaService.thrift@292
PS3, Line 292:   // The time, in seconds, that a session may be idle for before 
it is closed (and all
Note how this interacts with the flag.



--
To view, visit http://gerrit.cloudera.org:8080/8490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e
Gerrit-Change-Number: 8490
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 07 Nov 2017 17:03:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27:
> Thanks for explaining.
https://docs.google.com/document/d/1G-SPZelateebNxzVb67urEVtjc5Itw-B_jjfS85bSCE/edit?usp=sharing


http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/exec/filter-context.h@118
PS11, Line 118: Materialize filter values.
> what does that mean?
Done


http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/runtime/runtime-filter-bank.h@78
PS11, Line 78:   /// may both be NULL, representing a filter that allows all 
rows to pass.
> is it the case that at most one of bloom_filter and min_max_filter should b
Done


http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/util/min-max-filter.h@62
PS11, Line 62: Materialize filter values
> what does that mean?
Done


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103
PS9, Line 103:   12: optional string kudu_col_name
> case sensitive?
Done


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359
PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; }
> getExprCmpOp()
Done


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602
PS11, Line 602: if (!(targetExpr instanceof SlotRef)
> I think only explicit casts are problematic. Implicit casts should be ok, o
Right, we should be able to support integer implicit casts.

The complication is that if, say, the calculated max is outside of the range 
for the type of the targeted column, we can't just pass that value into Kudu as 
it will complain.

In that case, we would need to convert the max we send to be the max for the 
type of the targeted column. I've added some code to the BE to do this 
conversion.


http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:

http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144
PS11, Line 144: on a.month = cast(b.month + 1 as int);
> Was this your change? Why the change?
This was necessary if we didn't support implicit casts on the target. Removed 
now.



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 07 Nov 2017 15:06:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-07 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#12).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each hash join node
generates a bloom and min-max filter for each equi-join predicate, but
only those filters that end up being assigned to a target make it into
the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does not eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8464


Change subject: IMPALA-4591: Bound Kudu client error mem usage
..

IMPALA-4591: Bound Kudu client error mem usage

Previously, Kudu client errors could grow in size unbounded,
potentially causing the process to be killed. This patch sets a
bound on the mem that can be used for these error messages, with
the size determined by the flag 'kudu_error_buffer_size'.

If the errors for a Kudu client exceed this size, the query will fail,
as some errors will be dropped and we won't be able to tell if all of
the errors can be safely ignored.

Testing:
- Added a custom cluster test that verifies that a query that exceeds
  the limit fails.

Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M tests/custom_cluster/test_kudu.py
3 files changed, 39 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1
--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 03 Nov 2017 19:07:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 11:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13
PS9, Line 13: In RuntimeFilterGenerator in the planner, each hash join node
> ... each hash join node generates a bloom and min-max filter for each equi-
Done


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27:
> Not specific to the code changes, and I don't expect a response here (proba
RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the 
time a scan node will wait for any filter of either type to arrive. Similarly, 
RUNTIME_FILTER_MODE works the same for min-max as its worked for bloom.

The size related params - RUNTIME_BLOOM_FILTER_SIZE, 
RUNTIME_FILTER_MIN/MAX_SIZE only apply to bloom filters as the mem used by 
min-max is small and bounded.

I've updated some comments to make the above clearer.

One tricky case is DISABLE_ROW_RUNTIME_FILTERING. Kudu applies the filters on 
both a per-partition and per-row basis. At the moment, there's no way to change 
this - the filters are applied through the same interface as eg. predicates 
that are pushed down, so Kudu assumes that they have to be applied for 
correctness, not just for performance.

So, I could see the argument either way - we could disable Kudu filters if 
DISABLE_ROW_RUNTIME_FILTERING is true, though it would also disable the 
partition filtering, or we could just not consider 
DISABLE_ROW_RUNTIME_FILTERING for Kudu filters.

The remaining filter related option is MAX_NUM_RUNTIME_FILTERS, which I've 
addressed in another comment.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41:
> Contrived extreme queries are good data points, but how about running the T
Those results are posted in the review comments. I can include them here as 
well, I just felt it made the commit message excessively large.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44
PS9, Line 44:   timings are averages of 3 runs.
> does not eliminate
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h@164
PS8, Line 164: tvalue.ti
> I'm confused by this variable name.
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@27
PS8, Line 27:
> Can you maybe briefly describe what is tested for each filter type? There s
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@252
PS8, Line 252:   MinMaxFilter::Create(tFilter, string_type, _pool, 
_pool);
> Would TestEnv work? That is the semi-standard way to create an ExecEnv for
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@353
PS8, Line 353:   t3.ToTColumnValue();
> Remove?
Done


http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc@246
PS8, Line 246: null
> nullptr?
Done


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
> I think I understand why you did this, but it seems confusing from a user's
There's basically two reasons for this:
- We don't want to regress queries by eliminating bloom filters that used to be 
generated.
- The purpose of this param in the first place was to prevent the coordinator 
from being overwhelmed. Min-max filters are less heavy-weight than bloom 
filters so they don't put as much stress on the coordinator anyways.

Given that, I think that it would be fine to maintain this behavior even once 
HDFS can do min-max and Kudu can do bloom.

Another concern though is what to do if we add in-list filters, which are 
probably similarly heavy-weight to bloom filters.

One option would be to deprecate this and have separate max_num_bloom and 
max_num_in_list params, but that may be confusing and requires user action to 
keep things working as expected anyways.

Another option at that point would be to keep this param and make it "Maximum 
number of expensive filters (bloom or in-list)", and then there's just once we 
have to worry about regressing queries.

Or maybe we should just apply this to min-max filters as well, since its the 
least confusing option in the long run, possibly bumping up the default a 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#11).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each hash join node
generates a bloom and min-max filter for each equi-join predicate, but
only those filters that end up being assigned to a target make it into
the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does not eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#10).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-01 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#9).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-6127: Fix timeout in TestRuntimeFilter.test wait time

2017-10-31 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8427


Change subject: IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time
..

IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time

test_wait_time has been flaky recently on ASAN due to hitting a
timeout. The fix is to increase the timeout for ASAN builds.

Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de
---
M tests/query_test/test_runtime_filters.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/8427/1
--
To view, visit http://gerrit.cloudera.org:8080/8427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de
Gerrit-Change-Number: 8427
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 8:

TPCDS results:

 
+-++++-++++-+---+
 | Workload| Query  | File Format| 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | 
Iters |
 
+-++++-++++-+---+
 | TPCDS(_300) | TPCDS-Q73  | kudu / none / none | 
12.01  | 7.99| R +50.42%  |   0.31%|   1.04%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q68  | kudu / none / none | 
17.02  | 13.00   | R +30.97%  |   0.82%|   0.26%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q79  | kudu / none / none | 
16.61  | 12.95   | R +28.31%  |   0.56%|   0.54%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q1   | kudu / none / none | 2.36 
  | 1.99|   +18.19%  |   2.86%|   1.38%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q53  | kudu / none / none | 2.18 
  | 2.08|   +4.96%   |   0.32%|   1.98%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q8   | kudu / none / none | 4.03 
  | 3.93|   +2.38%   |   0.43%|   0.38%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q98  | kudu / none / none | 7.80 
  | 7.67|   +1.72%   |   0.66%|   0.81%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q7   | kudu / none / none | 3.75 
  | 3.71|   +1.15%   |   1.14%|   1.17%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q4   | kudu / none / none | 
29.59  | 29.27   |   +1.11%   |   4.13%|   3.67%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q65  | kudu / none / none | 
13.97  | 14.01   |   -0.28%   |   2.70%|   1.43%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q28  | kudu / none / none | 4.06 
  | 4.08|   -0.35%   |   0.84%|   0.48%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q89  | kudu / none / none | 2.46 
  | 2.48|   -0.62%   |   2.60%|   1.81%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q55  | kudu / none / none | 2.44 
  | 2.46|   -0.83%   |   1.08%|   5.41%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q42  | kudu / none / none | 2.42 
  | 2.47|   -1.89%   |   0.26%|   4.50%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q23  | kudu / none / none | 
90.96  | 95.30   |   -4.55%   |   1.06%|   2.21%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q43  | kudu / none / none | 5.05 
  | 5.34|   -5.49%   |   0.89%|   4.17%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q3   | kudu / none / none | 4.11 
  | 4.36|   -5.84%   |   2.48%|   1.15%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q61  | kudu / none / none | 8.56 
  | 9.93|   -13.83%  |   0.78%|   1.84%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q2   | kudu / none / none | 
12.19  | 15.68   | I -22.28%  | * 27.57% * |   1.11%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q47  | kudu / none / none | 
16.82  | 21.82   | I -22.91%  |   1.16%|   1.25%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q19  | kudu / none / none | 4.78 
  | 6.32| I -24.29%  |   1.14%|   0.97%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q46  | kudu / none / none | 6.58 
  | 8.77| I -24.88%  |   0.86%|   1.01%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q88  | kudu / none / none | 
14.89  | 19.84   | I -24.95%  |   0.17%|   3.79%| 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q59  | kudu / none / none | 
24.05  | 35.61   | I -32.45%  | * 11.97% * | * 20.61% * | 1   | 
4 |
 | TPCDS(_300) | TPCDS-Q34  | kudu / none / none | 3.94 
  | 6.82| I -42.24%  |   2.37%|   4.12%| 1   | 4
 |
 | TPCDS(_300) | TPCDS-Q27  | kudu / none / none | 3.97 
  | 7.91| I -49.85%  |   

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 8:

> > Patch Set 7:
 > >
 > > > > Patch Set 7:
 > >  > >
 > >  > > Perf results:
 > >  > > ...
 > >  >
 > >  > I'm surprised that only a few queries saw significant
 > speedups. Is
 > >  > this in line with what you saw with Parquet runtime filters on
 > >  > TPC-H? Or are we losing a lot by using min/max instead of
 > bloom or
 > >  > in-list style filters?
 > >
 > > Not sure about bloom filters perf, though I can run those numbers
 > for comparison.
 >
 > I haven't looked at this patch, but had a question about the
 > design:
 >
 > Are we still pushing blooms across a join to prevent shuffling of
 > data? Or are we now pushing _only_ min/max?
 >
 > It seems there is value in pushing both: the bloom for evaluation
 > on the other side of the join to prevent shuffling, and the min/max
 > to push all the way to the scanner to reduce I/O.
 >
 > Not sure if the patch is already doing this.

Impala only evaluates runtime filters in the scan. Even prior to this patch, 
the Kudu scanner was not evaluating bloom filters (and hash joins with Kudu 
scan targets don't build bloom filters).

It certainly could be useful to evaluate bloom filters on the Impala side of a 
Kudu scan, but I believe our thinking was that it wasn't worth it to implement 
that - better to just wait until bloom filters can be pushed all the way down 
into Kudu. If bloom filters in Kudu are a long way off, though, we should maybe 
reevaluate that.


--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:54:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 8:

(10 comments)

> I noticed that - BloomFilterBytes is always 0 in the query
 > profiles.
 > Can you please veirfy?

Yes. This line is always printed to the profile, even if there are no bloom 
filters being built. We don't expose the mem used by min-max filters in the 
profile as its negligible.

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG@26
PS7, Line 26:
> Might be worth mentioning why the runtime filters were renumbered in all th
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h@106
PS7, Line 106:   Status CloneFrom(const FilterContext& from, ObjectPool* pool, 
RuntimeState* state,
> Long lines
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@140
PS7, Line 140:   virtual void* GetMax() override { return _; }
> Nit: not sure why this is using underscores while AlwaysTrue() is using cam
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@176
PS7, Line 176:   StringValue min_;
> Can you briefly mention what it means when these are empty - that the value
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291
PS6, Line 291: out->__isset.min = true;
> Oh ok, thanks for clarifying. It looked superficially like a last line copy
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@137
PS7, Line 137: DCHECK(thrift.__isset.max);
> Do we have end-to-end tests for these code paths? I think we could generall
I've now added e2e and unit tests for all of the truncation scenarios.


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143: CopyToBuffer(_buffer_, _, max_.len);
> This has a fairly large number of edge cases - it would be good to unit tes
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143: uff
> static_cast instead of int()?
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@150
PS7, Line 150:
> I would have expected that we would modify the trailing values that overflo
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@207
PS7, Line 207: out->min.__set_string_val(in.min.string_val);
> It seems tricky to test these various error paths in end-to-end tests. Coul
As you say, its difficult to test the oom scenario in an e2e test, since the 
max amount of mem being used here is so small, but its covered with a unit test 
now.



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:52:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#8).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-6099: Fix crash in CheckForAlwaysFalse()

2017-10-24 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8369 )

Change subject: IMPALA-6099: Fix crash in CheckForAlwaysFalse()
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8369/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8369/1//COMMIT_MSG@14
PS1, Line 14:
brief note about testing, eg. "ran existing test ... which previously repro-ed 
this"



--
To view, visit http://gerrit.cloudera.org:8080/8369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3eda43845b78516c1e76e07e0d2dd9d862168e1d
Gerrit-Change-Number: 8369
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 24 Oct 2017 17:22:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN

2017-10-23 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8358 )

Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test:

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350
PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS;
> Do you think it makes sense to have different timeouts for each different t
Its not really necessary. There's one case (14) that seems to legitimately need 
more time than the others, but I don't see why its a problem to just have all 
the tests run with enough time for the slowest.

(14 is currently set to 100s, but testing locally its only about 30% slower 
than the tests that are set to 30s, so I sort of split the difference with the 
new version of the patch).



--
To view, visit http://gerrit.cloudera.org:8080/8358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Gerrit-Change-Number: 8358
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 23 Oct 2017 20:48:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN

2017-10-23 Thread Thomas Tauber-Marshall (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8358

to look at the new patch set (#2).

Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN
..

IMPALA-6004: Fix test_row_filters failure on ASAN

'Test case 16' in test_row_filters has been failing occasionaly on
ASAN as the runtime filters are not generated within the specified
RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase
RUNTIME_FILTER_WAIT_TIME_MS.

This patch updates all of the tests in test_row_filters to use the
same timeout, which is set to a higher value for ASAN builds.

Change-Id: Ia098735594b36a72f02bf7edd051171689618051
---
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
M tests/query_test/test_runtime_filters.py
2 files changed, 28 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8358/2
--
To view, visit http://gerrit.cloudera.org:8080/8358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Gerrit-Change-Number: 8358
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN

2017-10-23 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8358


Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN
..

IMPALA-6004: Fix test_row_filters failure on ASAN

"Test case 16" in test_row_filters has been failing occasionaly on
ASAN as the runtime filters are not generated within the specified
RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase
RUNTIME_FILTER_WAIT_TIME_MS.

Change-Id: Ia098735594b36a72f02bf7edd051171689618051
---
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8358/1
--
To view, visit http://gerrit.cloudera.org:8080/8358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Gerrit-Change-Number: 8358
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 7:

Perf results:

 
++--+++-++---++-+---+
 | Workload   | Query| File Format| Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
 
++--+++-++---++-+---+
 | TPCH(_100) | TPCH-Q20 | kudu / none / none | 7.04   | 6.10|   
+15.45%  |   3.51%   |   3.28%| 1   | 4 |
 | TPCH(_100) | TPCH-Q8  | kudu / none / none | 8.09   | 7.38|   +9.67% 
  |   1.11%   |   2.57%| 1   | 4 |
 | TPCH(_100) | TPCH-Q17 | kudu / none / none | 19.13  | 17.50   |   +9.33% 
  |   3.91%   |   4.34%| 1   | 4 |
 | TPCH(_100) | TPCH-Q22 | kudu / none / none | 2.71   | 2.56|   +6.00% 
  |   0.90%   |   4.11%| 1   | 4 |
 | TPCH(_100) | TPCH-Q16 | kudu / none / none | 4.23   | 4.01|   +5.45% 
  |   4.13%   |   0.53%| 1   | 4 |
 | TPCH(_100) | TPCH-Q1  | kudu / none / none | 10.87  | 10.55   |   +3.05% 
  |   1.06%   |   4.61%| 1   | 4 |
 | TPCH(_100) | TPCH-Q18 | kudu / none / none | 24.93  | 24.64   |   +1.18% 
  |   3.44%   |   1.29%| 1   | 4 |
 | TPCH(_100) | TPCH-Q6  | kudu / none / none | 2.27   | 2.26|   +0.77% 
  |   0.87%   |   1.35%| 1   | 4 |
 | TPCH(_100) | TPCH-Q7  | kudu / none / none | 18.55  | 18.43   |   +0.66% 
  |   3.49%   |   1.40%| 1   | 4 |
 | TPCH(_100) | TPCH-Q3  | kudu / none / none | 10.48  | 10.43   |   +0.47% 
  |   0.98%   |   1.43%| 1   | 4 |
 | TPCH(_100) | TPCH-Q4  | kudu / none / none | 15.94  | 15.90   |   +0.27% 
  |   2.07%   |   1.26%| 1   | 4 |
 | TPCH(_100) | TPCH-Q15 | kudu / none / none | 6.01   | 6.00|   +0.13% 
  |   2.57%   |   1.76%| 1   | 4 |
 | TPCH(_100) | TPCH-Q19 | kudu / none / none | 8.02   | 8.07|   -0.63% 
  |   3.79%   |   2.33%| 1   | 4 |
 | TPCH(_100) | TPCH-Q2  | kudu / none / none | 4.07   | 4.10|   -0.66% 
  |   5.74%   |   1.90%| 1   | 4 |
 | TPCH(_100) | TPCH-Q9  | kudu / none / none | 18.00  | 18.36   |   -1.96% 
  |   5.78%   |   1.17%| 1   | 4 |
 | TPCH(_100) | TPCH-Q13 | kudu / none / none | 11.44  | 11.67   |   -1.99% 
  |   3.58%   |   1.12%| 1   | 4 |
 | TPCH(_100) | TPCH-Q12 | kudu / none / none | 6.40   | 6.56|   -2.37% 
  |   2.20%   |   2.28%| 1   | 4 |
 | TPCH(_100) | TPCH-Q10 | kudu / none / none | 5.42   | 5.60|   -3.17% 
  |   1.90%   |   2.60%| 1   | 4 |
 | TPCH(_100) | TPCH-Q5  | kudu / none / none | 10.53  | 11.07   |   -4.87% 
  |   5.59%   |   1.44%| 1   | 4 |
 | TPCH(_100) | TPCH-Q21 | kudu / none / none | 51.31  | 54.29   |   -5.48% 
  |   2.18%   |   2.39%| 1   | 4 |
 | TPCH(_100) | TPCH-Q14 | kudu / none / none | 4.76   | 7.27| I 
-34.49%  |   0.87%   |   0.54%| 1   | 4 |
 | TPCH(_100) | TPCH-Q11 | kudu / none / none | 1.27   | 2.56| I 
-50.18%  |   2.62%   |   4.17%| 1   | 4 |
 
++--+++-++---++-+---+

For the queries that regressed somewhat - Q20,8,17 - the filters simply aren't 
very selective. We might consider a path to disable a filter that we see in not 
selective the way we do with bloom filters once Kudu exposes stats are filter 
effectiveness.


--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Oct 2017 21:20:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-10-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8146 )

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..


Patch Set 10: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Oct 2017 17:13:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-10-19 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8146 )

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..


Patch Set 9:

(4 comments)

Looks good

http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/exec/hdfs-scanner.cc@543
PS9, Line 543: *init_tuple_fn =
 :   codegen->GetFunction(IRFunction::HDFS_SCANNER_INIT_TUPLE, 
true);
single line?


http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/descriptors.cc@86
PS9, Line 86: ConstantAggregateZero* zeroes =
:   ConstantAggregateZero::get(null_indicator_offset_type);
single line?


http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h@182
PS9, Line 182: Status* status
Its unusual to have s status as an out parameter instead of returning it. Could 
you document why you did this?


http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h@184
PS9, Line 184: Helper for MaterializeStrings()
If this isn't intended to be called outside this class, make it private?



--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Oct 2017 17:09:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-18 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#7).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Functional Testing:
- Added new planner tests and updated the old ones.
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-18 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG@15
PS6, Line 15: them into the Kudu client. Because the Kudu client doesn't 
provide a
> Is there any documentation for Kudu about what ordering Kudu uses for compa
I asked the Kudu team, and its not documented anywhere, but they confirmed they 
use the same ordering as Impala does.


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@106
PS6, Line 106: bloom_filter
> has_bloom_filter() ? At callsites it reads like this should return a filter
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@109
PS6, Line 109: min_max_filter
> has_min_max_filter?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc@1220
PS6, Line 1220:   MinMaxFilter::Or(src_type(), params.min_max_filter, 
min_max_filter_.get());
> I mentioned this in a later comment, but we should consider what happens if
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc@235
PS6, Line 235:   MinMaxFilter* min_max_filter = MinMaxFilter::Create(type, 
_pool_);
> (nit) - could combine into one line.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc
File be/src/runtime/runtime-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc@27
PS6, Line 27:   // to a) the atomicity of / pointer assignments and b) the x86 
TSO memory model.
> Comment not relevant any more since we're using actual atomics?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc@53
PS6, Line 53:   min_str = string(value->ptr, value->len);
> We should think through the behaviour with very large strings. It may make
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h@83
PS6, Line 83: #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE)
 \
> Yeah, the macro seems like it may be the least of the evils.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@24
PS6, Line 24: "
> #include "runtime/timestamp-value.inline.h"
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@77
PS6, Line 77:   std::string NAME##MinMaxFilter::DebugString() const {   
 \
> nit: don't need std:: prefix in .cc files.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@139
PS6, Line 139:   out->min.__set_string_val(std::min(in.min.string_val, 
out->min.string_val));
> I feel like we should use our StringValue::Compare() function instead of re
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291
PS6, Line 291:   BigIntMinMaxFilter::Or(in, out);
> Shouldn't this be calling the timestamp method?
TColumnData doesn't have a timestamp field, so I convert timestamps with 
UtcToUnixTimeMicros and pass them around in thrift as longs. (noted in a 
comment now)

Unless I'm missing a reason that comparing timestamps with way won't lead to 
the same results as comparing the unconverted TimestampValues.


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@327
PS6, Line 327:   }
> Timestamp?
Same as above.


http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379
PS6, Line 379: if (slotRef == null || slotRef.getDesc().getColumn() == 
null
> Are there planner tests for all these cases?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:


[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

2017-10-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 11 Oct 2017 19:22:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.

2017-10-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8258 )

Change subject: IMPALA-6027: Retry downloading toolchain components.
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
Gerrit-Change-Number: 8258
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 11 Oct 2017 16:48:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

2017-10-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
..


Patch Set 3:

(6 comments)

This is definitely going to conflict with my patch, but you should be able to 
get this in first so I'll probably have to do the painful rebasing.

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG@13
PS3, Line 13: Testing
I assume you ran the existing runtime filter tests?


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc@129
PS3, Line 129: order to provide a regression test for IMPALA-3798 and a test 
for IMPALA-5789.
This is a little cluttered. I think its okay to just say "for testing purposes" 
and anyone who wants to know the exact JIRAs can easily grep for uses of it.


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h@129
PS3, Line 129: HasAlwaysFalseInList
I think this needs to be renamed for two reasons:
- We're eventually going to be adding 'in-list' filters (in addition to the 
existing "bloom" and soon "min-max" filters), so the "InList" here is 
potentially confusing.
- Its unusual for a function starting with "Has" to have side effects 
(incrementing the stats).

Maybe "CheckForAlwaysFalse"? Or something better you come up with.


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h
File be/src/runtime/runtime-filter.inline.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h@53
PS3, Line 53: return bloom_filter_ != BloomFilter::ALWAYS_TRUE_FILTER &&
:   bloom_filter_->AlwaysFalse();
single line?


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h@111
PS3, Line 111: hasn't been inserted any elements
hasn't had any elements inserted


http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py@34
PS3, Line 34: impalad = self.cluster.get_any_impalad()
: client = impalad.service.create_beeswax_client()
I think you can just add 'cursor' as a parameter to the 'test_' functions.



--
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 10 Oct 2017 18:11:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5529: [DOCS] New trunc() signatures

2017-10-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8189 )

Change subject: IMPALA-5529: [DOCS] New trunc() signatures
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice4753dee4f7b8e09c35508a9cad1e36f4ab2826
Gerrit-Change-Number: 8189
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 06 Oct 2017 21:59:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Change BufferedTupleStream::GetRows to returning multi batches

2017-10-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Change BufferedTupleStream::GetRows to returning 
multi batches
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG@14
PS1, Line 14:
Any testing?


http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/exec/partitioned-hash-join-node.cc@950
PS1, Line 950: break;
I think this changes the behavior here - this will only break out of the 
FOREACH_ROW loop, but what we want is to stop iterating over the entire set of 
build rows (that was previously one batch, but is now the same set of rows 
split into multiple batches).


http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h@339
PS1, Line 339: vector* batch
Can you move this to be after 'batch_size'? We generally keep out parameters as 
the trailing parameters for clarity.



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:24:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG@9
PS5, Line 9: This patch implements min-max filters for runtime filters. Each
> I had a general terminology question that popped up as I read through. Does
My intention (which I probably wasn't super consistent about, I'll try to clean 
it up) was that a "runtime filter" corresponds to a single filter id and might 
contain and bloom and/or min-max (eventually and/or in-list) filters.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@80
PS5, Line 80: /// FilterContext contains all metadata for a single runtime 
filter, and allows the filter
> This seems to be an example of a place where "runtime filter" could potenti
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@107
PS5, Line 107:
> Can you add a blank line between methods?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@422
PS5, Line 422: col_type, filter_expr->type().ToIR(codegen), 
"expr_type_arg");
> Why is decimal special here?
Decimal isn't supported for Kudu, so I didn't implement min-max filters for it 
since they wouldn't be testable.

I special cased it here because you might have a decimal bloom filter and you 
would hit this code path, but a better solution is to just check if the filter 
desc says to build a min-max filter, which will never be the case if the type 
is decimal.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@434
PS5, Line 434: // Call Insert() on the bloom filter.
> Can we make this a lookup table? Seems like the only difference between the
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc@401
PS5, Line 401: if (filter->BloomAlwaysTrue()
> Can we phrase the method in a way that doesn't depend on the assumption tha
I changed this to highlight the fact that the hdfs scanner only deals with 
bloom filters for now.

The runtime filter here might have a min-max filter, if it also has a kudu scan 
target, and since min-max filters don't really have a concept of "AlwaysTrue" 
we would have to say the filter isn't always true.

That would make sense if we were actually applying the min-max filter here, but 
its sort of weird since we're only really applying the bloom filter.

It can then be changed to not be bloom specific when we add consideration of 
min-max filters here.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@174
PS5, Line 174: for (const FilterContext& ctx : scan_node_->filter_ctxs_) {
> How important is this and is it blocked by anything else? If it's impactful
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@175
PS5, Line 175: filt
> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@177
PS5, Line 177: se()) {
> dynamic_cast seems unnecessary since the type should always be correct - ca
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@179
PS5, Line 179: eCurre
> const string&?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc@231
PS5, Line 231:   int64_t ts_micros;
> This timestamp code is a bit subtle. Is there a sane way to combine the con
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc@548
PS5, Line 548: IMER(repartition_ti
> Is this meant to double-count the filters if there is a bloom and min-max f
You're right, this needs to be counting "runtime filters" (corresponding to a 
single filter id/FilterContext).

This gets a little confusing, since you might have a bloom filter that gets 
disabled but its not counted as disabled here because the min-max filter isn't 
disabled, but it'll make a lot more sense in the long run goal here of having 
all scanners support all filter types (and of course the fact that the bloom 
filter is disabled in that situation will still show up in the runtime profile 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-06 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Tim Armstrong, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#6).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter and/or a min-max filter,
depending on if it has HDFS and/or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Testing:
- Updated planner tests.
- Ran existing runtime filter tests.
- Ran preliminary perf tests to demonstrate that it works. Will update
  with more specific results.
- Still needs more e2e tests.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M 
testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test
M tests/common/impala_test_suite.py
M tests/query_test/test_runtime_filters.py
M tests/util/test_file_parser.py
47 files changed, 1,581 insertions(+), 377 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/6
--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml@66
PS4, Line 66: ALTER TABLE name ALTER [COLUMN] 
column_name
> Just to confirm, no doc change needed as a result of this exchange?
Correct, no change needed.



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 06 Oct 2017 14:56:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml@66
PS4, Line 66: ALTER TABLE name ALTER [COLUMN] 
column_name
> does Impala support multiple alter steps at once?
No, that requires some significant changes to how Impala does analysis of ALTER 
TABLE statements, so we opted to go for the simpler solution of just a single 
alter op for now.



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 05 Oct 2017 21:54:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to bec2a24

2017-10-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8207 )

Change subject: Bump Kudu version to bec2a24
..


Patch Set 2: Code-Review+2

carrying forward


--
To view, visit http://gerrit.cloudera.org:8080/8207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
Gerrit-Change-Number: 8207
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 05 Oct 2017 15:46:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml@1014
PS3, Line 1014: or compacted
This is not correct. Only storage attribute changes get applied to compacted 
rows, not default value changes.



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:38:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to bec2a24

2017-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8207


Change subject: Bump Kudu version to bec2a24
..

Bump Kudu version to bec2a24

Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8207/1
--
To view, visit http://gerrit.cloudera.org:8080/8207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
Gerrit-Change-Number: 8207
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@67
PS2, Line 67: ]
extraneous?


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@69
PS2, Line 69: COMMENT 'comment_text'
This does parse, but it will always give an error because Kudu doesn't support 
column comments.


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@71
PS2, Line 71: {
missing the closing '}'


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@1038
PS2, Line 1038: any newly inserted rows
These attributes may in some cases end up applied to existing rows as Kudu 
performs compactions. Not sure if that should be noted here.



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 03 Oct 2017 21:57:32 +
Gerrit-HasComments: Yes


[native-toolchain-CR] Bump Kudu version to bec2a24

2017-10-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8192 )

Change subject: Bump Kudu version to bec2a24
..

Bump Kudu version to bec2a24

Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Thomas Tauber-Marshall: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
Gerrit-Change-Number: 8192
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[native-toolchain-CR] Bump Kudu version to bec2a24

2017-10-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8192 )

Change subject: Bump Kudu version to bec2a24
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
Gerrit-Change-Number: 8192
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 03 Oct 2017 18:30:55 +
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to bec2a24

2017-10-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8192 )

Change subject: Bump Kudu version to bec2a24
..


Patch Set 1:

http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/473/


--
To view, visit http://gerrit.cloudera.org:8080/8192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
Gerrit-Change-Number: 8192
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 03 Oct 2017 01:38:33 +
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to bec2a24

2017-10-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8192


Change subject: Bump Kudu version to bec2a24
..

Bump Kudu version to bec2a24

Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/92/8192/1
--
To view, visit http://gerrit.cloudera.org:8080/8192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd
Gerrit-Change-Number: 8192
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5383: [DOCS] Document unpartitioned Kudu tables

2017-10-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8180 )

Change subject: IMPALA-5383: [DOCS] Document unpartitioned Kudu tables
..


Patch Set 3:

(1 comment)

> Thomas, can I tag you in for a few Kudu-related reviews for 5.13,
 > in place of MJ?

Yep, happy to take a look

http://gerrit.cloudera.org:8080/#/c/8180/3/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

http://gerrit.cloudera.org:8080/#/c/8180/3/docs/topics/impala_create_table.xml@483
PS3, Line 483: -- Single partition. Only for 
When I built this locally, these tags were present in the output.



--
To view, visit http://gerrit.cloudera.org:8080/8180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2b466e1e482d62de84253c0cb406668fd5ad5eb
Gerrit-Change-Number: 8180
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 02 Oct 2017 15:54:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8148 )

Change subject: IMPALA-4252: Move runtime filters to ScanNode
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h
File be/src/exec/scan-node.h:

http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@195
PS1, Line 195: Scanner
> Comment needs an update - this is no longer an argument.
Done


http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@198
PS1, Line 198: rs to arrive, check
> Both HdfsScanNodeBase and KuduScanNodeBase have a "RuntimeState* runtime_st
Done


http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc@79
PS1, Line 79: // TODO: Move this to Prepare()
> Does this comment still make sense? I wish whoever added it had mentioned w
Not sure, it was left by Michael (I'll ping him about it) in the review 
"Disentangle Expr and ExprContext"

You asked about it during that review too, and his response was "I am thinking 
of moving it when PlanNode is introduced." but I'm not sure what that means.



--
To view, visit http://gerrit.cloudera.org:8080/8148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
Gerrit-Change-Number: 8148
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 29 Sep 2017 22:34:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8148

to look at the new patch set (#2).

Change subject: IMPALA-4252: Move runtime filters to ScanNode
..

IMPALA-4252: Move runtime filters to ScanNode

As a preliminary step towards adding runtime filters for Kudu scans,
this patch moves runtime filter related code from HdfsScanNodeBase to
ScanNode so that it's available to KuduScanNodeBase.

The change was mechanical with no logic changes, except for moving the
calculation of the max wait time into WaitForRuntimeFilters().

Testing:
- Ran existing runtime filter tests.

Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
6 files changed, 131 insertions(+), 114 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/2
--
To view, visit http://gerrit.cloudera.org:8080/8148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
Gerrit-Change-Number: 8148
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8169 )

Change subject: IMPALA-5994: Lower case struct-field names
..


Patch Set 4: Code-Review+2

Carrying forward +2.

Gvo failed due to trivial test issue.


--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 29 Sep 2017 21:14:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Hello Lars Volker, Alex Behm, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8169

to look at the new patch set (#4).

Change subject: IMPALA-5994: Lower case struct-field names
..

IMPALA-5994: Lower case struct-field names

Impala tries to always store column names in lower case. As part of a
cleanup of issues related to upper case Kudu column names, a check was
added in Analyzer to enforce this.

The check fails when doing star expansion on a struct to select all
fields in the case where a table was created in Hive with upper case
letters in a struct field name. This happens because Hive does not
covert struct field names to all lower case in HMS.

The solution is to force StructField names to lower case.

Testing:
- Added a test in test_nested_types.py
- Fixed FE test that expected struct field to be output in upper case.

Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
---
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/query_test/test_nested_types.py
3 files changed, 19 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/4
--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8154 )

Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test:

http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test@24
PS1, Line 24:
> Isn't this test also susceptible to the same timing issue? I don't see much
Done



--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 29 Sep 2017 18:55:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8154

to look at the new patch set (#2).

Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout
..

IMPALA-5951: Remove flaky test_catalogd_timeout

test_catalogd_timeout sets a Kudu operation timeout of 1ms and then
performs various Kudu operations which it expects to fail due to a
timeout.

Since the test was written, things have sped up - for example, Impala
used to create a new Kudu client for each operation, but that was
changed in IMPALA-5167, such that the operations now occasionally
complete quickly enough that they don't timeout.

There's not really any way to rewrite this test to ensure that it
won't be flaky, so the patch removes it.

Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
---
D 
testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test
M tests/custom_cluster/test_kudu.py
2 files changed, 0 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8154/2
--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8169 )

Change subject: IMPALA-5994: Lower case struct-field names
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8169/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8169/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5994: Lower case struct-field names
> Commit msg should describe the change/fix, not the symptom of the bug. For
Done


http://gerrit.cloudera.org:8080/#/c/8169/2/fe/src/main/java/org/apache/impala/catalog/StructField.java
File fe/src/main/java/org/apache/impala/catalog/StructField.java:

http://gerrit.cloudera.org:8080/#/c/8169/2/fe/src/main/java/org/apache/impala/catalog/StructField.java@37
PS2, Line 37: // Impala expects field names to be in lower case, but type 
strings stored in the HMS
> // Impala expects field names to be in lower case, but type strings stored
Done



--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 29 Sep 2017 16:16:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names

2017-09-29 Thread Thomas Tauber-Marshall (Code Review)
Hello Lars Volker, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8169

to look at the new patch set (#3).

Change subject: IMPALA-5994: Lower case struct-field names
..

IMPALA-5994: Lower case struct-field names

Impala tries to always store column names in lower case. As part of a
cleanup of issues related to upper case Kudu column names, a check was
added in Analyzer to enforce this.

The check fails when doing star expansion on a struct to select all
fields in the case where a table was created in Hive with upper case
letters in a struct field name. This happens because Hive does not
covert struct field names to all lower case in HMS.

The solution is to force StructField names to lower case.

Testing:
- Added a test in test_nested_types.py

Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
---
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M tests/query_test/test_nested_types.py
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/3
--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields

2017-09-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8169 )

Change subject: IMPALA-5994: Failure in star expansion on struct fields
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8169/1/fe/src/main/java/org/apache/impala/catalog/StructField.java
File fe/src/main/java/org/apache/impala/catalog/StructField.java:

http://gerrit.cloudera.org:8080/#/c/8169/1/fe/src/main/java/org/apache/impala/catalog/StructField.java@37
PS1, Line 37: // Impala expects column names to be lower cased, but this is 
not always true for
> Can you add a brief comment why we need to do this?
Done


http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py@96
PS1, Line 96: rCase
> nit: This seems easy to miss. Can you pick something more obvious, like cam
Done


http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py@100
PS1, Line 100: U
> Does the uppercase F matter here?
Not really. This query passed even before the fix, just adding a little extra 
coverage while I'm here.



--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 28 Sep 2017 22:27:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields

2017-09-28 Thread Thomas Tauber-Marshall (Code Review)
Hello Lars Volker, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8169

to look at the new patch set (#2).

Change subject: IMPALA-5994: Failure in star expansion on struct fields
..

IMPALA-5994: Failure in star expansion on struct fields

Impala tries to always store column names in lower case. As part of a
cleanup of issues related to upper case Kudu column names, a check was
added in Analyzer to enforce this.

The check fails when doing star expansion on a struct to select all
fields in the case where a table was created in Hive with upper case
letters in a struct field name. This happens because Hive does not
covert struct field names to all lower case in HMS.

The solution is to force StructField names to lower case.

Testing:
- Added a test in test_nested_types.py

Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
---
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M tests/query_test/test_nested_types.py
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/2
--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields

2017-09-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8169


Change subject: IMPALA-5994: Failure in star expansion on struct fields
..

IMPALA-5994: Failure in star expansion on struct fields

Impala tries to always store column names in lower case. As part of a
cleanup of issues related to mixed case Kudu column names, a check was
added in Analyzer to enforce this.

The check fails when doing star expansion on a struct to select all
fields in the case where a table was created in Hive with upper case
letters in a struct field name. This happens because Hive does not
covert struct field names to all lower case in HMS.

The solution is to force StructField names to lower case.

Testing:
- Added a test in test_nested_types.py

Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
---
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M tests/query_test/test_nested_types.py
2 files changed, 16 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/1
--
To view, visit http://gerrit.cloudera.org:8080/8169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e
Gerrit-Change-Number: 8169
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5951: test catalogd timeout fails to cause expected exception

2017-09-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8154


Change subject: IMPALA-5951: test_catalogd_timeout fails to cause expected 
exception
..

IMPALA-5951: test_catalogd_timeout fails to cause expected exception

test_catalogd_timeout sets a Kudu operation timeout of 1ms and then
performs various Kudu operations which it expects to fail due to a
timeout.

Since the test was written, things have sped up - for example, Impala
used to create a new Kudu client for each operation, but that was
changed in IMPALA-5167, such that the operations now occasionally
complete quickly enough that they don't timeout.

This patch disables all but the first test. The remaining tests can
likely be reactivated once we have a way of invalidating clients
(IMPALA-5685).

Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
---
M 
testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test
1 file changed, 13 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8154/1
--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 5:

(1 comment)

> The Parquet writer does something similar for the Parquet
 > statistics, can any of that code be reused with regards to codegen?

This was discussed in the original design thread, and we felt that there was 
too much parquet specific stuff current in/going to be added soon in the 
parquet statistics code, so we decided to just keep it separate.

http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift@740
PS2, Line 740:   4: optional i64 queue_timeout_ms;
 :
 :   // Default query options that are applied to requests mapped 
to this pool.
 :
> This could be a reasonable approach, or another idea is:
Done



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 27 Sep 2017 03:08:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#5).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter and/or a min-max filter,
depending on if it has HDFS and/or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Testing:
- Updated planner tests.
- Ran existing runtime filter tests.
- Ran preliminary perf tests to demonstrate that it works. Will update
  with more specific results.
- Still needs more e2e tests.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
37 files changed, 1,464 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/5
--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] DRAFT - IMPALA-4252: Min-max runtime filters for Kudu

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#4).

Change subject: DRAFT - IMPALA-4252: Min-max runtime filters for Kudu
..

DRAFT - IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter and/or a min-max filter,
depending on if it has HDFS and/or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Testing:
- Updated planner tests.
- Ran existing runtime filter tests.
- Ran preliminary perf tests to demonstrate that it works. Will update
  with more specific results.
- Still needs more e2e tests.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
37 files changed, 1,464 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/4
--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8148


Change subject: IMPALA-4252: Move runtime filters to ScanNode
..

IMPALA-4252: Move runtime filters to ScanNode

As a preliminary step towards adding runtime filters for Kudu scans,
this patch moves runtime filter related code from HdfsScanNodeBase to
ScanNode so that it's available to KuduScanNodeBase.

Testing:
- Ran existing runtime filter tests.

Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
4 files changed, 126 insertions(+), 107 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/1
--
To view, visit http://gerrit.cloudera.org:8080/8148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086
Gerrit-Change-Number: 8148
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 3:

The GVO failed because there was a kudu insert test that checked the profile 
for the 'SpilledRuns: 0' line. Since that is now gone, the test checks for 
'SortType: Partial' (the real point of the test was that the query wouldn't 
have run previously due to the mem limit, which still applies, so we're not 
particularly losing any coverage).


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Sep 2017 19:46:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8123

to look at the new patch set (#3).

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..

IMPALA-5870: Improve runtime profile for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
since they will always be 0, and it renames the 'IntialRunsCreated'
counter to 'RunsCreated' since the 'Initial' refers to the fact that
in a regular sort those runs may be spilled or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their profiles
- Updated a kudu_insert test that relied on the 'SpilledRuns' counter
  to be 0 for a partial sort.

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
5 files changed, 11 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/3
--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 7: Code-Review+2

GVO failed because of WARN_UNUSED_RESULT not being followed. Added a 
RETURN_IF_ERROR


--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:26:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, 
Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#7).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/7
--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14
PS1, Line 14: This patch removes the counters 'SpilledRuns' and 
'MergesPerformed'
> I'm skeptical about this part of the change. It definitely a wart but there
Sounds good to me. I removed the EXPLAIN related stuff from this patch, and 
filed IMPALA-5972


http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520
PS1, Line 1520: initial_runs_
> "runs_counter_" is a bit inaccurate for spilling sorts since it doesn't inc
Done



--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 23 Sep 2017 00:05:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-22 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8123

to look at the new patch set (#2).

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..

IMPALA-5870: Improve runtime profile for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
since they will always be 0, and it renames the 'IntialRunsCreated'
counter to 'RunsCreated' since the 'Initial' refers to the fact that
in a regular sort those runs may be spilled or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their profiles

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
4 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/2
--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@216
PS2, Line 216:
> Forgot in the previous pass: can you include an example of the IR it genera
Done


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@217
PS2, Line 217: Status FilterContext::CodegenInsert(
> So I've been working on this, but it turns out to be tricky. I uploaded a n
After much discussion with Tim, we've decided not to bother trying to do the 
cross-compilation and to just stick with the ir builder version.

The reason basically is that the Function returned by 
ScalarExpr::GetCodegendComputeFn() can't quite be used as a drop-in replacement 
for ScalarExprEvaluator::GetValue() (the first returns the value directly while 
the second returns a pointer) and the work necessary to fix this without losing 
efficiency is beyond the scope of this patch.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc@935
PS2, Line 935: Status PhjBuilder::CodegenInsertRuntimeFilters(
> It would be good to have an example of the IR here too.
Done


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc@24
PS2, Line 24: IR_ALWAYS_INLINE
> Is the always_inline needed here?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Sep 2017 21:35:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-22 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, 
Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#6).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/6
--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort

2017-09-21 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8123


Change subject: IMPALA-5870: Improve explain/profile output for partial sort
..

IMPALA-5870: Improve explain/profile output for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for
partial sorts, since they can't spill. It does this by setting the
spillable buffer size in the resource profile to -1. Since the BE
relied on that number to determine the page size for sorts, it
now calculates the page size from the min reservation, which gives
an equivalent value.

For the runtime profile, it removes the counters 'SpilledRuns' and
'MergesPerformed' since they will always be 0, and it renames the
'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial'
refers to the fact that in a regular sort those runs may be spilled
or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their explain
  plans and profile

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M fe/src/main/java/org/apache/impala/planner/SortNode.java
6 files changed, 22 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/1
--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
> Yeah, I have no idea why it doesn't work. The types definitely line up or i
So I've been playing around with this more, and I have a theory. I noticed that 
it works if I only replace 'GetType' and not 'GetValue', so the problem may 
actually be with 'GetValue'.

I think that the value that's being returned by the function generated by 
GetCodegendComputePtrFn() is stack allocated (because ToNativePtr() is called 
on 'result' which then calls CreateEntryBlockAlloca, which allocates stack 
space).

Basically, GetCodegendComputePtrFn() is trying to create a codegend copy of 
ScalarExprEvaluator::GetValue() but its not currently doing the part where the 
value is stored in 'result_' so that a pointer to it can be passed back.

One solution would be to figure out how to write IRBuilder for storing the 
value in 'result_' (or else cross compile ScalarExprEvaluator::GetValue() and 
sub things in), but that's potentially complicated. It may also be worth just 
going back to the original version of this patch to eliminate the perf cost 
from having to store the value in 'result_'.

Or maybe you had something in mind with your other comment about not needing to 
create a whole new function that would solve this problem as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5949: fix test exchange small delay failure

2017-09-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5949: fix test_exchange_small_delay failure
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia75c42be2de600344de7af5a917d7843880ea6de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

Line 40:   filter-context-ir.cc
> Missing this file?
Done


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
> This does look like it should work to me. I think the types should line up 
Yeah, I have no idea why it doesn't work. The types definitely line up or it 
wouldn't pass verification, and as far as I can tell the type is being passed 
correctly, its just somehow corrupting/overwriting the actual value that's 
getting hashed.

I've been playing around with it, and the only thing I can figure out that 
makes it work is to change the cross-complied function to assign the ColumnType 
to a local variable rather than just handling a ref to it, but of course that 
somewhat defeats the point here.

Is there someone who knows more about llvm that can help me with this? I can 
post the generated IR somewhere for someone to look at. I've already spent 
quite a lot of time trying to get this to work, and I'm not sure what else to 
do since I'm finding llvm to be very poorly documented.


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 368:   // Set the pointer to NULL in case it evaluates to NULL.
> It seems like the main way this differs from other places is returning a nu
I don't know what you mean by this. If we're going to replace 
ScalarExprEval::GetValue() with ReplaceCallSites, then we need another function 
to replace it with.


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-20 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#5).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/filter-context-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
16 files changed, 278 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-20 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#4).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/filter-context-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
16 files changed, 279 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-18 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 217: 
> You're right, we should be able to cross-compile Insert() and substitute in
So I've been working on this, but it turns out to be tricky. I uploaded a new 
patch, but it doesn't fully work.

One issue is that the function returned by ScalarExpr::GetCodegendComputeFn() 
can't be used as a drop-in replacement for ScalarExprEvaluator::GetValue() as 
they have different return types (AnyVal and void* respectively). I solved this 
with ScalarExpr::GetCodegendComputePtrFn(), but it means keeping a lot of the 
IRBuilder code that's here.

The other issue is how to deal with making the type argument to GetHashValue() 
a constant. As far as I can tell, we don't already have any tools for directly 
replacing arguments to functions.

The latest version of the patch defines a function GetType(), called in 
FilterContext::Insert(), that I replace with ReplaceCallSitesWithValue(), but 
this doesn't work (in particular, the values to be hashed that are passed to 
GetHashValue() appear to be corrupted, and from dumping the IR the branching 
isn't eliminated anyways), and I'm not sure if its even reasonable to expect it 
to work since we don't use ReplaceCallSitesWithValue() in that way anywhere 
else.

I also considered making something like RawValue::CodegenGetHashValue() which 
would take the type and return a codegen'd function that can be used to replace 
GetHashValue() with ReplaceCallSites() (ie. the codegen'd function would still 
also take a type argument but would ignore it), but this would be a bunch more 
IRBuilder (more even than the original version of the patch, I think), so I 
wanted to see if that approach makes sense before doing it.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 135:   // Same as Insert(), but skips the CPU check and assumes that AVX 
is not available.
> naming nit: We actually need AVX2 -- AVX won't cut it
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-18 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#3).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
15 files changed, 244 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 246 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 124:   static Status CodegenInsert(LlvmCodeGen* codegen, ScalarExpr* 
filter_expr,
> This is a bit tricky because it doesn't actually do exactly the same thing 
Done


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 953:   for (int i = 0; i < num_filters; ++i) {
> Nit: it looks like the two branches are identical except for adding the NoI
Done


Line 958: Value* filter_context_ptr =
> Going forward, we want to avoid have this dependence between the codegen co
Done


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

Line 32: }
> It would be nice to keep this in header file to avoid regressing the non-co
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug

2017-09-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5416: Fix an impala-shell command recursion bug
..


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug

2017-09-14 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5416: Fix an impala-shell command recursion bug
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8063/2//COMMIT_MSG
Commit Message:

PS2, Line 12: The cause is that there is a "cmdqueue" member in cmd library, 
which is
: used to execute commands not directly from user input. When 
impala-shell
: reads a line with multiple commands, it splits the line into 
multiple
: queries and insert them into the queue, and then gives control 
back to
: the eventloop in cmd library. The problem is that a source command
: calls execute_query_list(), which executes queries in the 
cmdqueue as
: well. So any query in the cmdqueue will be executed twice. And if 
there
: is unfortunately a source command in the cmdqueue, it will call
: execute_query_list() again, and there will be an infinite 
recursion.
: The original purpose of running queued queries in 
execute_query_list()
: is that in non-interactive mode, there is no event loop. And 
still,
: there are queries like "use database" queued by connection setup
: procedures, which need to be run before the user query.
: This patch avoids running queries from the queue in
: execute_query_list(), and for non-interactive mode, runs queued 
queries
: in execute_queries_non_interactive_mode() instead.
Thanks for the explanation.

Hate to be picky, but this is more technical detail than I was looking for. The 
idea here is to give the reader a high level idea of what's going on while 
keeping it simple.

Maybe something like (assuming I understand completely):
The cmdqueue member of cmd.Cmd is used to execute commands not directly from 
user input in an event loop. When a 'source' is run, execute_query_list() is 
called which also executes the commands in cmdqueue, causing them to be 
executed twice.

The fix is for execute_query_list() to not run the commands in cmdqueue. For 
the non-interactive case, where the event loop won't be run, we call 
execute_query_list() with cmdqueue so that the commands get run.


-- 
To view, visit http://gerrit.cloudera.org:8080/8063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug

2017-09-14 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5416: Fix an impala-shell command recursion bug
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8063/1//COMMIT_MSG
Commit Message:

PS1, Line 11: This patch fixes these bugs.
Can you describe briefly what the problem with the existing code was?

Looking at the code change, my intuition is that the old and new code should be 
equivalent, but apparently that's not the case.


-- 
To view, visit http://gerrit.cloudera.org:8080/8063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 3f49724

2017-09-12 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8040

Change subject: Bump Kudu version to 3f49724
..

Bump Kudu version to 3f49724

Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8040/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8040
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, 
unknown.
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS3, Line 1583:   
whitespace


PS3, Line 3034:  
whitespace, here and below


http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java:

PS3, Line 26: Class describing a predicate that tests a boolean values using IS.
:  * Example: (1 < 1) IS TRUE
You should explain more thoroughly what this expr actually does, eg. the 
handling of NULLs.


PS3, Line 28: a
:  * CompoundPredicate
weird line wrapping here


PS3, Line 33: _
We don't use trailing underscores on constant values.


PS3, Line 94: assert
Preconditions.checkState


http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java:

PS3, Line 39: if (!(expr instanceof BoolTestPredicate))
:   return expr;
single line


PS3, Line 64: private BoolTestToCompoundRule() {
:   }
single line


http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

PS3, Line 164:   
whitespace


http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 1257: // BoolTestPredicate.
modify one of these tests to use "NOT"


-- 
To view, visit http://gerrit.cloudera.org:8080/8014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[native-toolchain-CR] Bump Kudu version to 3f49724

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to 3f49724
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to 3f49724

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged.

Change subject: Bump Kudu version to 3f49724
..


Bump Kudu version to 3f49724

Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Thomas Tauber-Marshall: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8029

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), including skipping consideration of
FilterContext's that don't have a local_bloom_filter, and eliminates
the branch on type that happens in RawValue::GetHashValue().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 232 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[native-toolchain-CR] Bump Kudu version to 3f49724

2017-09-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8028

Change subject: Bump Kudu version to 3f49724
..

Bump Kudu version to 3f49724

Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/28/8028/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] Bump Kudu version to a71ecfd

2017-09-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to a71ecfd
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie23d852f0d630f9484d8ae4f772af6bba13ea24f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to a71ecfd

2017-09-06 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to a71ecfd
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4fb177b80f6c193af3f6b6bf9d7b205ab31d2f3e
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


  1   2   3   4   5   6   >