[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert .. IMPALA-5262: test_analytic_order_by_random fails with assert This was a poorly written test that relies on assumptions about the behavior of 'rand' and the order that rows get processed in a table that Impala doesn't actually guarantee. The new version is still sensitive to the precise behavior of 'rand()', but shouldn't be flaky unless that behavior is changed. Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Reviewed-on: http://gerrit.cloudera.org:8080/6775 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M tests/query_test/test_sort.py 1 file changed, 5 insertions(+), 7 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Alex Behm has posted comments on this change. Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Zach Amsden has uploaded a new patch set (#5). Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries .. IMPALA-4864 Speed up single slot predicates with dictionaries When dictionaries are present we can pre-evaluate conjuncts against the dictionary values and simply look up the result. Status of this diff: Compiles and starts. Bitmap tests for new functionality pass. FE and BE tests passing. Query test now passing (run not finished). Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 --- M be/src/common/global-flags.cc M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/parquet-scratch-tuple-batch.h M be/src/util/bitmap-test.cc M be/src/util/bitmap.h M be/src/util/dict-encoding.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test 12 files changed, 446 insertions(+), 185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/5 -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6575 to look at the new patch set (#10). Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning .. IMPALA-5180: Don't use non-deterministic exprs in partition pruning Non-deterministic exprs which evaluate as constant should not be used during HDFS partition pruning. We consider Exprs which have no SlotRefs as bound by default, and thus we end up trying to apply them indisrciminately. Constant propagation makes this situation easier to run into and the behavior is rather unexpected. The fix for now is to explicitly disallow non-deterministic Exprs in partition pruning. Change-Id: I91054c6bf017401242259a1eff5e859085285546 --- M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test 4 files changed, 40 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/10 -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries .. Patch Set 4: (2 comments) A couple quick observations http://gerrit.cloudera.org:8080/#/c/6726/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS4, Line 1454: ); The front end orders conjuncts by selectivity and cost. When we pull them out and attach them to column materialization, the order is not preserved. If the conjunct is evaluated using the dictionary, this should be fine. If the conjunct is not evaluated from the dictionary, then it might result in a more expensive evaluation. To put numbers on it: Suppose there are two conjuncts A and B. A is expensive (cost = 10) and super selective (eliminates 0.99). B is cheap (cost = 1) and moderately selective (eliminates 0.50). The front end might put B first, so if B eliminates 50% of the row, then A is called 50% of the time to eliminate the rest. This has an amortized cost of 1 + 0.50 * 10 = 6, which is cheaper than calling A 100% of the time. We can reorder the materialization of the columns at runtime using knowledge of which columns are dictionary encoded and which aren't. PS4, Line 1467: endif It should be possible to do this up in HdfsScanNode. As an example, see extractKuduConjuncts in KuduScanNode. This pulls out conjuncts that will be evaluated by Kudu. -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6610/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 529: } > let's verify (DCHECK) that all tuples are dense at this point. After implementing this DCHECK, it turns out we hit it quite often. The reason is we sometimes send materialized tuples without a mem layout to the BE (which are basically unusable) eg, when replacing plan trees with an EmptySetNode. There is a TODO in DescriptorTable.toThrift() regarding this. -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Alex Behm has posted comments on this change. Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: Line 112: !conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) { > Since we ignore these predicates here, they still get attached to the plan Yes, I think so: * dropping the predicate would be wrong * evaluating the predicate against partitions would arguably change the query semantics based on a physical design choice * the same query against an unpartitioned table might give very different results So we are left with evaluating the predicate for every row. -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() The slot descriptor vectors are not guaranteed to be sorted on the slot index within a tuple. As a result, TupleDescriptor::LayoutEquals() sometimes returned a wrong result. In this patch, we sort the vectors of slot descriptors on the slot index within the tuple before comparing the vectors. Testing: - ran EE tests locally. Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e --- M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-query/queries/QueryTest/union.test 5 files changed, 51 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6610/7 -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/527/ -- To view, visit http://gerrit.cloudera.org:8080/6775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert
Alex Behm has posted comments on this change. Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Alex Behm has posted comments on this change. Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. Patch Set 7: FE changes lgtm. -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Jim Apple has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > The nightly perf test suite. Where can I find information about the nightly perf test suite? -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Dan Hecht has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > OK. Can you be specific about the benchmarks you'd like to see run, to avoi The nightly perf test suite. -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Jim Apple has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > Oops, yeah cmp. OK. Can you be specific about the benchmarks you'd like to see run, to avoid to many iterations of this patch set, including any node configuration or scale factor or file type restrictions you will require? As a heads-up: I am not in a position to spend a lot of time setting up benchmarks that don't work out-of-the box with Apache Impala on a single node, so I may have to pass this off to someone else or just submit this change to the benchmark and let somebody else finish it if what you need requires more time than I have. -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Dan Hecht has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > I don't understand - what use of memcpy? I was hoping only to replace the / Oops, yeah cmp. There aren't many direct callers but the caller callers are used in various places (and in and out of the IR), and over various data sizes/patterns. Which is why it'd be nice to verify with the benchmarks that everything's good. -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Dan Hecht has posted comments on this change. Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6610/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 529: } let's verify (DCHECK) that all tuples are dense at this point. -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Jim Apple has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > The particular query test wouldn't likely cover all the places we use memcp I don't understand - what use of memcpy? I was hoping only to replace the /body/ of StringCompare with a call to mem/cmp/, and there aren't that many places where we call StringCompare. -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6610 to look at the new patch set (#7). Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() .. IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() The slot descriptor vectors are not guaranteed to be sorted on the slot index within a tuple. As a result, TupleDescriptor::LayoutEquals() sometimes returned a wrong result. In this patch, we sort the vectors of slot descriptors on the slot index within the tuple before comparing the vectors. Testing: - ran EE tests locally. Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e --- M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-query/queries/QueryTest/union.test 5 files changed, 51 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6610/7 -- To view, visit http://gerrit.cloudera.org:8080/6610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5031: remove undefined behavior: call to strncmp with nullptr
Dan Hecht has posted comments on this change. Change subject: IMPALA-5031: remove undefined behavior: call to strncmp with nullptr .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6767/1/be/src/runtime/string-value.inline.h File be/src/runtime/string-value.inline.h: Line 59: const int result = len > 0 ? strncmp(s1, s2, len) : 0; > Example backtrace: How about a short comment explaining strncmp() is undefined for nullptrs. Some day, we may change things at a lower level so that even when str.len == 0, we have a non-null str.ptr, so we can avoid these checks. And so it would be good to leave a hint that this branch can be removed at that time. -- To view, visit http://gerrit.cloudera.org:8080/6767 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id563e81720a0a4847664fa2828ecfdcad870da5b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
Dan Hecht has posted comments on this change. Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG Commit Message: PS1, Line 17: memcmp is sse4.1-based > Yes, it looks like it: The particular query test wouldn't likely cover all the places we use memcpy. Probably our targeted perf tests and tpcds/tpch in the cluster benchmarks would give good coverage. -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Reviewed-on: http://gerrit.cloudera.org:8080/5939 Reviewed-by: Dan HechtTested-by: Impala Public Jenkins --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly
Dan Hecht has posted comments on this change. Change subject: IMPALA-4866: Hash join node does not apply limits correctly .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 506: while (true) { there are a lot of other places that this loop can exit. Can't the out_batch contain too many rows for all of those exit paths as well? -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5246: UDF's Close() should handle Expr's preparation failure
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5246: UDF's Close() should handle Expr's preparation failure .. Patch Set 1: Code-Review+1 (1 comment) This change makes sense to me. http://gerrit.cloudera.org:8080/#/c/6757/1//COMMIT_MSG Commit Message: PS1, Line 10: Put a period here -- To view, visit http://gerrit.cloudera.org:8080/6757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2c689246ed4f8dd38f104fa35904f3926a7039c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Zach Amsden has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 88: return (value >> bits) | (value << (64 - bits)); > Can't DCHECK in a constexpr; is documenting the limitation sufficient? Turns out none of this matters. clang now refuses to define lzcnt unless target("lzcnt") feature is set, which means we can't use the intrinsics. Also, gutil/bits seems to have been upgraded in capabilities since, so some of this functionality is unnecessary. -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security
Impala Public Jenkins has submitted this change and it was merged. Change subject: [DOCS] Mention Kerberos and TLS for Kudu security .. [DOCS] Mention Kerberos and TLS for Kudu security Link to Apache Kudu docs for details of new security features. Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc Reviewed-on: http://gerrit.cloudera.org:8080/6634 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/shared/impala_common.xml M docs/topics/impala_kudu.xml 3 files changed, 17 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6634 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] Mention Kerberos and TLS for Kudu security .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6634 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] Mention Kerberos and TLS for Kudu security .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/112/ -- To view, visit http://gerrit.cloudera.org:8080/6634 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6775/1/tests/query_test/test_sort.py File tests/query_test/test_sort.py: Line 189 > How about running this query and asserting that the result is sorted? Done -- To view, visit http://gerrit.cloudera.org:8080/6775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert .. IMPALA-5262: test_analytic_order_by_random fails with assert This was a poorly written test that relies on assumptions about the behavior of 'rand' and the order that rows get processed in a table that Impala doesn't actually guarantee. The new version is still sensitive to the precise behavior of 'rand()', but shouldn't be flaky unless that behavior is changed. Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 --- M tests/query_test/test_sort.py 1 file changed, 5 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6775/2 -- To view, visit http://gerrit.cloudera.org:8080/6775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Jim Apple has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 88: return (value >> bits) | (value << (64 - bits)); > Can't DCHECK in a constexpr; is documenting the limitation sufficient? WFM -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Zach Amsden has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 88: return (value >> bits) | (value << (64 - bits)); > DCHECK that this isn't the case? Can't DCHECK in a constexpr; is documenting the limitation sufficient? -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Jim Apple has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 88: return (value >> bits) | (value << (64 - bits)); > DCHECK that this isn't the case? With a comment explaining why. -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Zach Amsden has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 88: return (value >> bits) | (value << (64 - bits)); > This is undefined behavior when bits is 0 or 64: DCHECK that this isn't the case? -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Zach Amsden has posted comments on this change. Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: Line 112: !conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) { Since we ignore these predicates here, they still get attached to the plan node - resulting in this: Failed tests: PlannerTest.testHdfs:96->PlannerTestBase.runPlannerTestFile:764->PlannerTestBase.runPlannerTestFile:759 Expected failure, but query produced PLAN. Query: select * from functional.alltypes where rand() > year PLAN: PLAN-ROOT SINK | 00:SCAN HDFS [functional.alltypes] partitions=24/24 files=24 size=478.45KB predicates: year < rand() So we can actually evaluate randomized predicates now, they just aren't used for partition filtering. Do we actually want to do that? -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4866: Hash join node does not apply limits correctly .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/6778/1//COMMIT_MSG Commit Message: PS1, Line 11: exhcnage typo http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: PS1, Line 580: > >= ? Line 581: out_batch->CommitRows(limit_ - num_rows_returned_); I think you need to update num_rows_returned_ as well http://gerrit.cloudera.org:8080/#/c/6778/1/tests/common/test_dimensions.py File tests/common/test_dimensions.py: PS1, Line 114: # Don't run with NUM_NODES=1 due to IMPALA-561 : # ALL_CLUSTER_SIZES = [0, 1] update comment -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: optional 3rd and 4th arguments for instr().
Jim Apple has posted comments on this change. Change subject: IMPALA-3973: optional 3rd and 4th arguments for instr(). .. Patch Set 5: Any updates on this? The comment about line 335 was left on January 3rd. -- To view, visit http://gerrit.cloudera.org:8080/5589 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Reviewer: zi+z...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/526/ -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Dan Hecht has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 11: > Uploaded patch set 11. Fixed WARN_UNUSED_RESULT warning that failed clang-tidy job. -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#11). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Hello Impala Public Jenkins, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5939 to look at the new patch set (#11). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 850 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning .. Patch Set 9: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/525/ -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#8). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M