[native-toolchain-CR] Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply patches function to appropriately apply patches w.r.t host arch's
Valencia Edna Serrao has posted comments on this change. Change subject: Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's .. Patch Set 1: > > > It appears you have forgotten to squash your commits using git > > > rebase > > > > > > I ran the following commands before pushing the second patch set. > > git fetch asf-gerrit > > git rebase asf-gerrit/master > > and it showed my branch is up-to-date > > > > Please let me know if there is any other check I need to do. > > Please google "git rebase squash", then squash the two patches, > leaving only the Change-Id in the commit message from the first > one. Thanks, Jim! for guiding me on rectifying this. I have squashed all commit to one keeping the Change-Id of the first commit. Please let me know if it is fine. -- To view, visit http://gerrit.cloudera.org:8080/6739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna SerraoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: No
[native-toolchain-CR] Ported native-toolchain to work on ppc64le
Valencia Edna Serrao has uploaded a new patch set (#3). Change subject: Ported native-toolchain to work on ppc64le .. Ported native-toolchain to work on ppc64le Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a --- M buildall.sh M functions.sh M init-compiler.sh M init.sh M source/autoconf/build.sh A source/breakpad/breakpad-88e5b2c8806bac3f2c80d2fe80094be5bd371601-patches/0003-Build-breakpad_88e5b2c-on-ppc64le.patch A source/breakpad/breakpad-88e5b2c_ppc.patch M source/crcutil/build.sh A source/crcutil/crcutil-440ba7babeff77ffad992df3a10c767f184e946e-patches/0001_crc_autogen.patch A source/crcutil/crcutil-440ba7babeff77ffad992df3a10c767f184e946e-patches/0002_crc_interface_cc.patch M source/cyrus-sasl/build.sh M source/glog/build.sh M source/kudu/build.sh M source/libevent/build.sh M source/libtool/build.sh M source/libunwind/build.sh M source/llvm/build-3.3.sh M source/llvm/build-source-tarball.sh M source/openldap/build.sh M source/openssl/build.sh M source/thrift/build.sh 21 files changed, 1,279 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/68/6468/3 -- To view, visit http://gerrit.cloudera.org:8080/6468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a Gerrit-PatchSet: 3 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna SerraoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Valencia Edna Serrao
[native-toolchain-CR] Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply patches function to appropriately apply patches w.r.t host arch's
Jim Apple has posted comments on this change. Change subject: Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's .. Patch Set 1: > > It appears you have forgotten to squash your commits using git > > rebase > > > I ran the following commands before pushing the second patch set. > git fetch asf-gerrit > git rebase asf-gerrit/master > and it showed my branch is up-to-date > > Please let me know if there is any other check I need to do. Please google "git rebase squash", then squash the two patches, leaving only the Change-Id in the commit message from the first one. -- To view, visit http://gerrit.cloudera.org:8080/6739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna SerraoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: No
[native-toolchain-CR] Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply patches function to appropriately apply patches w.r.t host arch's
Valencia Edna Serrao has posted comments on this change. Change subject: Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's .. Patch Set 1: > It appears you have forgotten to squash your commits using git > rebase I ran the following commands before pushing the second patch set. git fetch asf-gerrit git rebase asf-gerrit/master and it showed my branch is up-to-date Please let me know if there is any other check I need to do. -- To view, visit http://gerrit.cloudera.org:8080/6739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna SerraoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: No
[native-toolchain-CR] Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply patches function to appropriately apply patches w.r.t host arch's
Jim Apple has posted comments on this change. Change subject: Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's .. Patch Set 1: It appears you have forgotten to squash your commits using git rebase -- To view, visit http://gerrit.cloudera.org:8080/6739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna SerraoGerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[native-toolchain-CR] Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply patches function to appropriately apply patches w.r.t host arch's
Valencia Edna Serrao has uploaded a new change for review. http://gerrit.cloudera.org:8080/6739 Change subject: Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's .. Aligned ppc64le-ported native-toolchain to be consistent with upstream style Also, updated apply_patches function to appropriately apply patches w.r.t host arch's Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 --- M buildall.sh M functions.sh M init-compiler.sh M init.sh M source/autoconf/build.sh A source/breakpad/breakpad-88e5b2c8806bac3f2c80d2fe80094be5bd371601-patches/0003-Build-breakpad_88e5b2c-on-ppc64le.patch M source/breakpad/build.sh M source/crcutil/build.sh A source/crcutil/crcutil-440ba7babeff77ffad992df3a10c767f184e946e-patches/0001_crc_autogen.patch R source/crcutil/crcutil-440ba7babeff77ffad992df3a10c767f184e946e-patches/0002_crc_interface_cc.patch D source/crcutil/ppc-patches/crc_makefile_am.patch M source/cyrus-sasl/build.sh M source/glog/build.sh M source/kudu/build.sh D source/kudu/kudu-config.sh D source/kudu/setup_gcc493.sh M source/libevent/build.sh M source/libtool/build.sh M source/libunwind/build.sh M source/llvm/build-3.3.sh M source/llvm/build-source-tarball.sh M source/openldap/build.sh M source/openssl/build.sh M source/thrift/build.sh 24 files changed, 709 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/39/6739/1 -- To view, visit http://gerrit.cloudera.org:8080/6739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9450a8df6844f45f84f4859840a9fcab1b1ca526 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna Serrao
[Impala-ASF-CR] IMPALA-5245: Disable buffer-allocator-test under ASAN
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5245: Disable buffer-allocator-test under ASAN .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/511/ -- To view, visit http://gerrit.cloudera.org:8080/6737 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e685d4482d548cc3e724c20a83ae14890ce56ec Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan HechtGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 5: (22 comments) http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 65: void* value, bool copy_strings = true); const void* value, to indicate it's not an output param http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 64: row_.reset(table->schema().NewRow()); can this not return an error? Line 71: int col = tkudu_partition_expr_.referenced_columns[i]; move to where it's being used leave todo somewhere that the schema/our metadata should record the partition cols, this should really be part of the table descriptor. Line 75: // the KuduTableSink generate the error message. what's the point of round-robin instead of always returning, say, 0? Line 79: Status s = WriteKuduRowValue(row_.get(), col, type, val, false); where is this declared? it might be nice to require some form of namespace qualification for functions that aren't in the impala namespace. i wish kudu hadn't introduced these cumbersome nested namespaces. Line 82: DCHECK(s.ok()); add some context output (print called function, type and col name?). Line 89: DCHECK(s.ok()); same here http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h File be/src/exprs/kudu-partition-expr.h: Line 24: #include "runtime/descriptors.h" do you need this include, as opposed to forward decls? Line 46: TKuduPartitionExpr tkudu_partition_expr_; forward declare? Line 50: /// Used to call into Kudu to determine partitions. it's nice to indicate what gets set in prepare(), because that tells me it doesn't change during execution (despite the fact that it's not const). same for the other member vars. http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: Line 350: kudu_ = sink.output_partition.type == TPartitionType::KUDU; have the partition type as a member var, rather than 3 bools? Line 425: // to simplify this. i don't think that's practical, for broadcast and random we have different/more efficient logic that doesn't require evaluating exprs. Line 456: // hash-partition batch's rows across channels i'd leave a todo here to coalesce this with the kudu case, which would require the hash computation to be done via an expr (which probably requires codegen support, which would be worth having anyway). http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 109: bool kudu_; comment missing http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 115: private List partitionKeyIdxs_ = Lists.newArrayList(); > The primary keys are always the first n columns, but the partition keys can that would be good to explain in the code, it's a bit subtle http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 643: Set kuduPartitionByColumnNames = null; get rid of 'partitionby' here as well Line 682: // declaration, and store their indexes. We need those exprs in the original order to 'store their indexes' is a bit mysterious, 'column indexes' or even column positions is more informative. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 90: protected String toSqlImpl() { return "KuduPartition()"; } > Yes, it is included in the explain output. See the planner test files for e i think so, as it is it's pretty useless. http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 44: // Maps from this Epxrs children to column positions in the table, i.e. children_[i] since these are column positions, call the variable that? i have to keep reminding myself what 'key indexes' are. Line 48: private List partitionKeyTypes_; isn't this available in the table descriptor? http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 82: return new DataPartition(TPartitionType.KUDU, expr); why not just call the list c'tor and create a list here? (and drop the new non-list c'tor)
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6389 to look at the new patch set (#25). Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. IMPALA-5003: Constant propagation in scan conjuncts Implements constant propagation within conjuncts and applies the optimization to scan conjuncts and collection conjuncts within Hdfs scan nodes. The optimization is applied during planning. At scan nodes in particular, we want to optimize to enable partition pruning. In certain cases, we might end up with a FALSE conditional, which now will convert to an EmptySet node. Testing: Expanded the test cases for the planner to achieve constant propagation. Added Kudu, datasource, Hdfs and HBase tests to validate we can create EmptySetNodes. Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 19 files changed, 636 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/25 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 25 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Dan Hecht has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS8, Line 601: . ... in 'scanner_status'. PS8, Line 604: Status& scanner_status please pass this as a 'Status* scanner_status', which is what we usually do for out parameters to make it more obvious at the callsite that it's modified. And nice to put out parameters last. http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: PS8, Line 124: local_date_time lt(temp, timezone); can this throw an exception? PS8, Line 125: *this = lt.local_time(); or that? http://gerrit.cloudera.org:8080/#/c/5939/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS8, Line 122: HDFS it's the same for other FileSystem interfaced things, like S3. So maybe remove "HDFS". -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5245: Disable buffer-allocator-test under ASAN
Alex Behm has posted comments on this change. Change subject: IMPALA-5245: Disable buffer-allocator-test under ASAN .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6737 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e685d4482d548cc3e724c20a83ae14890ce56ec Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan HechtGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: Disable buffer-allocator-test under ASAN
Dan Hecht has posted comments on this change. Change subject: IMPALA-5245: Disable buffer-allocator-test under ASAN .. Patch Set 1: Kicked off a private ASAN build of BE tests. Will wait for that to complete before pushing this. -- To view, visit http://gerrit.cloudera.org:8080/6737 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e685d4482d548cc3e724c20a83ae14890ce56ec Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan HechtGerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: Disable buffer-allocator-test under ASAN
Dan Hecht has uploaded a new change for review. http://gerrit.cloudera.org:8080/6737 Change subject: IMPALA-5245: Disable buffer-allocator-test under ASAN .. IMPALA-5245: Disable buffer-allocator-test under ASAN Until we figure out the root cause, disable this test (which exercises code that is under development and not yet enabled by default). Change-Id: I4e685d4482d548cc3e724c20a83ae14890ce56ec --- M be/src/runtime/bufferpool/buffer-allocator-test.cc 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6737/1 -- To view, visit http://gerrit.cloudera.org:8080/6737 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4e685d4482d548cc3e724c20a83ae14890ce56ec Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit d2d3f4c (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Reviewed-on: http://gerrit.cloudera.org:8080/6724 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M be/src/testutil/test-udas.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 5 files changed, 153 insertions(+), 52 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5192: Don't bake MemPool* into IR
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5192: Don't bake MemPool* into IR .. IMPALA-5192: Don't bake MemPool* into IR Tuple::CodegenMaterializeExprs() currently bakes the MemPool* provided by its caller into the generated IR. The MemPool* usually belongs to some exec nodes which owns the codegend function and it's used for allocating string buffer. With multi-threading, IR needs to be shared across multiple fragment instances so IR can no longer contain pointers not shared across fragment instances. This change fixes the problem above by using the MemPool* argument passed to the IR function. This also cleans up UnionNode by removing the field tuple_pool_ from it and the logic for transferring buffer from tuple_pool_ to the MemPool of the row batch. Change-Id: I09d620e48032351ab9805825a4afb6536bed2302 Reviewed-on: http://gerrit.cloudera.org:8080/6657 Reviewed-by: Michael HoTested-by: Impala Public Jenkins --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/union-node-ir.cc M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/tuple.cc M be/src/runtime/tuple.h 11 files changed, 107 insertions(+), 121 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I09d620e48032351ab9805825a4afb6536bed2302 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5192: Don't bake MemPool* into IR
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5192: Don't bake MemPool* into IR .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I09d620e48032351ab9805825a4afb6536bed2302 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: > (1 comment) Yes, I think we should do that. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
anujphadke has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 23: (1 comment) http://gerrit.cloudera.org:8080/#/c/6389/23//COMMIT_MSG Commit Message: PS23, Line 9: proopagation nit: typo -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Dan Hecht has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 123: local_date_time lt(temp, timezone); > I've investigated the history of UtcToLocal(): Great, thanks for checking that. So then should we consider rewriting UtcToLocal() in terms of FromUtc()? Not as part of this change and something we can sort out later. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/510/ -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6322 to look at the new patch set (#9). Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: - Added e2e tests in test_sort.py. - Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M tests/query_test/test_sort.py 10 files changed, 340 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6322/9 -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 9: Code-Review+2 Rebased. -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Alex Behm has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 23: Code-Review+2 Let me know if you need help investigation that last issue. -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/509/ -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 23: So this tripped up on an functional.alltypes_datasource test, and the first failure was obvious, but now there is a new failure mode which isn't obvious at all, and unclear how it relates to this change: [localhost:21000] > select string_col from alltypes_datasource where string_col = 'VALIDATE_PREDICATES##id NOT_DISTINCT 1 && id DISTINCT_FROM 1 && id NOT_DISTINCT 1' and 1 <=> id and 1 IS DISTINCT FROM id and 1 IS NOT DISTINCT FROM id; Query: select string_col from alltypes_datasource where string_col = 'VALIDATE_PREDICATES##id NOT_DISTINCT 1 && id DISTINCT_FROM 1 && id NOT_DISTINCT 1' and 1 <=> id and 1 IS DISTINCT FROM id and 1 IS NOT DISTINCT FROM id Query submitted at: 2017-04-26 17:20:43 (Coordinator: http://impala-dev:25000) ERROR: InternalException: Data source DataSource{name=alltypesdatasource, location=hdfs://localhost:20500/test-warehouse/data-sources/test-data-source.jar, className=org.apache.impala.extdatasource.AllTypesDataSource, apiVersion=V1} returned an error from prepare(): Error in data source (path=/tmp/test-data-source.29405.0.jar, class=org.apache.impala.extdatasource.AllTypesDataSource, version=V1) prepare(): No error message returned by data source. Check the impalad log for more information. -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6389 to look at the new patch set (#23). Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. IMPALA-5003: Constant propagation in scan conjuncts Implements constant proopagation within conjuncts and applies the optimization to scan conjuncts and collection conjuncts within Hdfs scan nodes. The optimization is applied during planning. At scan nodes in particular, we want to optimize to enable partition pruning. In certain cases, we might end up with a FALSE conditional, which now will convert to an EmptySet node. Testing: Expanded the test cases for the planner to achieve constant propagation. Added Kudu, datasource, Hdfs and HBase tests to validate we can create EmptySetNodes. Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test 19 files changed, 634 insertions(+), 92 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/23 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Alex Behm has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6322 to look at the new patch set (#8). Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: - Added e2e tests in test_sort.py. - Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M tests/query_test/test_sort.py 9 files changed, 323 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6322/8 -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Michael Ho has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6724/4/be/src/testutil/test-udas.cc File be/src/testutil/test-udas.cc: PS4, Line 59: nctionContex > why "Intermediate"? THis validates arg and return type as well. Renamed to ValidateFunctionContext* -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6724 to look at the new patch set (#5). Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit d2d3f4c (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 --- M be/src/testutil/test-udas.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 5 files changed, 153 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/6724/5 -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5192: Don't bake MemPool* into IR
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5192: Don't bake MemPool* into IR .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/508/ -- To view, visit http://gerrit.cloudera.org:8080/6657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I09d620e48032351ab9805825a4afb6536bed2302 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 123: ToPtime(); > I seem to remember there also being a lock in the boost code related to loc I've investigated the history of UtcToLocal(): - UtcToLocal() was first introduced in IMPALA-1658 (87b9fac2adda). This version of UtcToLocal() already used localtime_r() for UTC -> local time conversion. It also used couple boost functions to convert back and forth between 'ptime' and 'struct tm' types (to_tm(), ptime_from_tm(), nanoseconds()) - The first version of UtcToLocal() made querying tables with timestamps a lot slower (up to 30x times slower). The root cause was a global lock in localtime_r(). This is discussed in IMPALA-3316 and IMPALA-2125. Both JIRAs are still open. - UtcToLocal() was made 50% faster by 4ddce7f97880. This change got rid of the boost functions (to_tm(), ptime_from_tm(), nanoseconds()) and added the comment at L77. The change kept localtime_r() though, so it didn't do anything to solve the global lock problem. I wrote a simple benchmark program to confirm that UtcToLocal()'s performance degrades if the number of threads running it in parallel is increased. The benchmark program also confirms that FromUtc() does not have this problem. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6563/4/common/thrift/parquet.thrift File common/thrift/parquet.thrift: Line 563: /** Empty structs to signal SIGNED or UNSIGNED sort orders */ this has now changed, please take a look at https://github.com/apache/parquet-format/commit/041708da1af52e7cb9288c331b542aa25b68a2b6 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Dan Hecht has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6724/4/be/src/testutil/test-udas.cc File be/src/testutil/test-udas.cc: PS4, Line 59: Intermediate why "Intermediate"? THis validates arg and return type as well. oh, from the next two Validate routines, I think you mean to distinguish them based on the type of intermediate, not the types you are validating? if that's the case, this name doesn't distinguish. -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 22: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/507/ -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. Patch Set 6: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/506/ -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Michael Ho has posted comments on this change. Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6724/2//COMMIT_MSG Commit Message: Line 7: IMPALA-5251: Fix propagation of input exprs' types in 2-phase aggregation > long line, can use "agg" instead of "aggregation" Done http://gerrit.cloudera.org:8080/#/c/6724/2/be/src/testutil/test-udas.cc File be/src/testutil/test-udas.cc: PS2, Line 148: assert(context->GetNumArgs() == 3); : assert(context->GetArgType(0)->type == FunctionContext::TYPE_DECIMAL); : assert(context->GetArgType(0)->precision == 20); : assert(context->GetArgType(0)->scale == 10); : assert(context->GetArgType(1)->type == FunctionContext::TYPE_BIGINT); : assert(context->GetArgType(2)->type == FunctionContext::TYPE_STRING); : assert(context->GetIntermediateType().type == FunctionContext::TYPE_STRING); : assert(context->GetReturnType().type == FunctionContext::TYPE_DECIMAL); : assert(context->GetReturnType().precision == 38); : assert(context->GetReturnType().scale == 21); > these are the same set of asserts in each function, right? (which is kind o Done http://gerrit.cloudera.org:8080/#/c/6724/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 107: // original input argument exprs are in 'agg.MergeAggInputFn_' so use it instead. > typo: agg.mergeAggInputFn_ Done -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Michael Ho has uploaded a new patch set (#4). Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit d2d3f4c (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 --- M be/src/testutil/test-udas.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 5 files changed, 153 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/6724/4 -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg .. IMPALA-5251: Fix propagation of input exprs' types in 2-phase agg Since commit d2d3f4c (on asf-master), TAggregateExpr contains the logical input types of the Aggregate Expr. The reason they are included is that merging aggregate expressions will have input tyes of the intermediate values which aren't necessarily the same as the input types. For instance, NDV() uses a binary blob as its intermediate value and it's passed to its merge aggregate expressions as a StringVal but the input type of NDV() in the query could be DecimalVal. In this case, we consider DecimalVal as the logical input type while StringVal is the intermediate type. The logical input types are accessed by the BE via GetConstFnAttr() during interpretation and constant propagation during codegen. To handle distinct aggregate expressions (e.g. select count(distinct)), the FE uses 2-phase aggregation by introducing an extra phase of split/merge aggregation in which the distinct aggregate expressions' inputs are coverted and added to the group-by expressions in the first phase while the non-distinct aggregate expressions go through the normal split/merge treatement. The bug is that the existing code incorrectly propagates the intermediate types of the non-grouping aggregate expressions as the logical input types to the merging aggregate expressions in the second phase of aggregation. The input aggregate expressions for the non-distinct aggregate expressions in the second phase aggregation are already merging aggregate expressions (from phase one) in which case we should not treat its input types as logical input types. This change fixes the problem above by checking if the input aggregate expression passed to FunctionCallExpr.createMergeAggCall() is already a merging aggregate expression. If so, it will use the logical input types recorded in its 'mergeAggInputFn_' as references for its logical input types instead of the aggregate expression input types themselves. Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 --- M be/src/testutil/test-udas.cc M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 5 files changed, 151 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/6724/3 -- To view, visit http://gerrit.cloudera.org:8080/6724 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158303b20d1afdff23c67f3338b9c4af2ad80691 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 22: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/507/ -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts
Alex Behm has posted comments on this change. Change subject: IMPALA-5003: Constant propagation in scan conjuncts .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4864 [DRAFT] Speed up single slot predicates with dictionaries
Zach Amsden has uploaded a new patch set (#3). Change subject: IMPALA-4864 [DRAFT] Speed up single slot predicates with dictionaries .. IMPALA-4864 [DRAFT] Speed up single slot predicates with dictionaries When dictionaries are present we can pre-evaluate conjuncts against the dictionary values and simply look up the result. Status of this diff: Compiles and starts. Bitmap tests for new functionality pass. Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 --- M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/parquet-scratch-tuple-batch.h M be/src/util/bitmap-test.cc M be/src/util/bitmap.h M be/src/util/dict-encoding.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 10 files changed, 421 insertions(+), 181 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/3 -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden