[Impala-ASF-CR] IMPALA-1616: Improve the Memory Limit Exceeded error report
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1616: Improve the Memory Limit Exceeded error report .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibb4e0c359d889938b4c351771ba539850bdb95ea Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 2: (5 comments) Thanks for switching back to unique_lock, it makes it a bit easier to reason about for me. Change looks good, just a few comments. http://gerrit.cloudera.org:8080/#/c/4350/2/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 76: // Sleep with read lock held to block off other readers which cannot Won't this change the meaning of total_get_wait_time_ so that we're only counting the single thread that got the lock? This is ok since I don't think the counter is contractual, and the new definition may be more useful, but should update the comment. Line 111: write_lock.unlock(); Not your change but it'd be more consistent to use braced scoping to release the lock instead of this explicit unlock. Line 155: uint32_t GetSize() const { It looks like this is sometimes called with lock held or not. Seems like we should document this (i.e. caller should hold lock for non-racy read). Line 161: uint64_t total_get_wait_time() const { It doesn't look like total_get_wait_time and total_put_wait_time have any callers - could just get rid ofthe functions? Line 189: boost::mutex write_lock_; We might be able to further reduce contention if we align the locks and other data structures so that they're guaranteed to not share 64-bit cache lines. Maybe group read and write members together so that the can share cache lines, then align the first member of both groups to 64 bytes? They're only 40 bytes on my system, so with the current layout read_lock and write_lock may be on the same cache line: #include #include #include #include struct test { boost::mutex mutex1; boost::mutex mutex2; }; int main() { printf("sizeof(pthread_mutex_t)=%ld\n", sizeof(pthread_mutex_t)); printf("sizeof(boost::mutex)=%ld\n", sizeof(boost::mutex)); printf("sizeof(pthread_cond_t)=%ld\n", sizeof(pthread_cond_t)); printf("offsetof(...)=%ld\n", offsetof(struct test, mutex2)); return 0; } [~]$ g++ mutex.cc -lboost_system -o mutex&& ./mutex sizeof(pthread_mutex_t)=40 sizeof(boost::mutex)=40 sizeof(pthread_cond_t)=48 offsetof(...)=40 [~]$ -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter .. Patch Set 2: Code-Review+2 Verified+1 Have we considered added some basic sanity tests for the query generator? Might help prevent it bitrotting in future. -- To view, visit http://gerrit.cloudera.org:8080/4357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: stak...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter
Tim Armstrong has submitted this change and it was merged. Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter .. IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter IMPALA-4100: * Postgres and Impala support EXTRACT([field] from [date-type]), but Hive doesn't * Hive has other UDFs that perform the same function, but they have different names * This commit modifies the HiveSqlWriter to use the corresponding Hive functions IMPALA-4112: * Postgres and Impala support IS [NOT] DISTINCT FROM clauses as a null safe equals * Hive doesn't support this clause, but has a null safe equals operator: <=> * This commit modifies the HiveSqlWriter to use <=> instead of IS [NOT] DISTINCT FROM Testing: * This commit only modifies the HiveSqlWriter, so no testing against Impala was done * Tested locally against Hive Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11 Reviewed-on: http://gerrit.cloudera.org:8080/4357 Reviewed-by: Michael Brown Reviewed-by: Tim Armstrong Tested-by: Tim Armstrong --- M tests/comparison/model_translator.py 1 file changed, 27 insertions(+), 0 deletions(-) Approvals: Michael Brown: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/4357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: stak...@cloudera.com
[Impala-ASF-CR] Fix typo in buildall.sh introduced in IMPALA-4006
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4382 Change subject: Fix typo in buildall.sh introduced in IMPALA-4006 .. Fix typo in buildall.sh introduced in IMPALA-4006 The typo resulted in a silent failure: an error message was printed in the middle of the buildall.sh output and the branch was never taken. Change-Id: I7a0f74b93bb31bd0c56fc4c20f42f8ab1fc6de78 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/4382/1 -- To view, visit http://gerrit.cloudera.org:8080/4382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7a0f74b93bb31bd0c56fc4c20f42f8ab1fc6de78 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4028: Trim sentry config file path spaces while impala start. .. Patch Set 3: How did you come across this? Did you start Impala from the command line directly or use some other tool (e.g. cloudera manager)? -- To view, visit http://gerrit.cloudera.org:8080/4309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: davy...@163.com Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: davy...@163.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 4: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4353/2/be/src/testutil/death-test-util.h File be/src/testutil/death-test-util.h: PS2, Line 41: setrlimit(RLIMIT_CORE, &limit); : : minidumps_enabled_before_ = EnableMinidumpsForTest(false); > I assume it's necessary to disable core dumps when we've disabled minidumps I'm not sure I understand the comment. The class is meant to disable both in tests, but I don't think there's a dependency between the two. The previous version of the class in promise-test.cc only disabled coredumps. http://gerrit.cloudera.org:8080/#/c/4353/3/be/src/util/minidump.h File be/src/util/minidump.h: PS3, Line 29: for death test > nit: maybe remove or reword like "for example during death test" or "during Done -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4353 to look at the new patch set (#4). Change subject: IMPALA-4111: backend death tests should not produce minidumps .. IMPALA-4111: backend death tests should not produce minidumps Move the existing core dump disabler into a shared utility header and also disable minidumps too. Add a macro to simplify use of the disabler. Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 --- A be/src/testutil/death-test-util.h M be/src/util/minidump.cc M be/src/util/minidump.h M be/src/util/promise-test.cc 4 files changed, 83 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4353/4 -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4350/2/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: PS2, Line 185: mutex > worth changing these to SpinLock, which I believe are a bit faster to lock/ Unfortunately we don't have a condition variable implementation that works with SpinLock (that would be very nice to have). -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/4350/2/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 155: > It's always racy as we never hold read_lock when reading the size of read_l Ah, good point. http://gerrit.cloudera.org:8080/#/c/4350/3/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: PS3, Line 188: put_list shouldn't these members have _ at the end? Line 197: boost::mutex get_lock; Shouldn't we align get_lock? http://gerrit.cloudera.org:8080/#/c/4350/3/be/src/util/condition-variable.h File be/src/util/condition-variable.h: Line 28: /// Simple wrapper around POSIX pthread condition variable. Explain briefly how it differs from the boost cv? PS3, Line 29: struct class? PS3, Line 29: __attribute__ ((aligned(64))) What do you think about putting this in compiler-util.h as CACHE_ALIGN or something along those lines? PS3, Line 61: cv cv_ PS3, Line 62: ConditionVariable This and the typedef shouldn't be necessary in C++, it automatically lets you use the short name of the class/struct. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4028: Trim sentry config file path spaces while impala start. .. Patch Set 3: I don't think we should go ahead with the trimming solution for the reasons I mentioned earlier (problems with trying to "fix" misconfigurations and inconsistency with other command line options). However part of the problem seems to be that it's very hard to see the spaces in the error message. Maybe you could change the patch just to quote the file name in the error message? -- To view, visit http://gerrit.cloudera.org:8080/4309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: davy...@163.com Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: davy...@163.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4350/3/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 99: bool BlockingPut(const T& val) { I think there's a pretty bad bug with how the signalling works. Consider a queue with a limit of N. If the queue gets into a state where get_list has N elements and put_list has 0 elements with all of the producer threads waiting on it, then the caller to BlockingGet() will drain get_list before signalling put_cv, even though it should wake up a producer thread once there is space in the queue. Worse, it will only wake up one producer thread each time BlockingGet() is called, which means that one only producer thread can run at a time. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 4: Somehow Lars' earlier comments got deleted. Reproducing from my email: Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4353/3/be/src/util/minidump.h File be/src/util/minidump.h: Lars Volker has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4353/3/be/src/util/minidump.h File be/src/util/minidump.h: PS3, Line 29: for death test nit: maybe remove or reword like "for example during death test" or "during tests" -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4326 to look at the new patch set (#4). Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. IMPALA-4008: don't bake in hash table and hash join pointers This fixes some of the cases where memory addresses are baked into codegen'd code. Testing: Ran exhaustive build. Perf: Ran a local perf run. No significant changes. I was able to see some small improvements on microbenchmarks. +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH(_20) | parquet / none / none | 9.07| +0.46% | 5.88 | +0.34% | +---+---+-++++ +---+--+---++-++++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +---+--+---++-++++-+---+ | TPCH(_20) | TPCH-Q2 | parquet / none / none | 2.12 | 1.89| +12.29% | * 10.85% * | * 20.30% * | 1 | 10| | TPCH(_20) | TPCH-Q13 | parquet / none / none | 9.84 | 9.34| +5.39% | 9.01%| 3.79%| 1 | 10| | TPCH(_20) | TPCH-Q17 | parquet / none / none | 14.61 | 14.19 | +2.97% | 2.15%| 1.52%| 1 | 10| | TPCH(_20) | TPCH-Q18 | parquet / none / none | 14.76 | 14.35 | +2.82% | 3.20%| 2.64%| 1 | 10| | TPCH(_20) | TPCH-Q9 | parquet / none / none | 13.72 | 13.54 | +1.30% | 1.75%| 0.70%| 1 | 10| | TPCH(_20) | TPCH-Q8 | parquet / none / none | 5.71 | 5.64| +1.30% | 1.21%| 1.23%| 1 | 10| | TPCH(_20) | TPCH-Q19 | parquet / none / none | 47.35 | 46.75 | +1.28% | 2.39%| 1.88%| 1 | 10| | TPCH(_20) | TPCH-Q5 | parquet / none / none | 4.57 | 4.52| +1.20% | 1.30%| 0.88%| 1 | 10| | TPCH(_20) | TPCH-Q16 | parquet / none / none | 2.07 | 2.05| +1.12% | 2.59%| 1.79%| 1 | 10| | TPCH(_20) | TPCH-Q11 | parquet / none / none | 1.45 | 1.45| +0.15% | 2.69%| 2.06%| 1 | 10| | TPCH(_20) | TPCH-Q3 | parquet / none / none | 4.65 | 4.65| -0.09% | 2.12%| 2.17%| 1 | 10| | TPCH(_20) | TPCH-Q4 | parquet / none / none | 3.22 | 3.23| -0.26% | 1.03%| 1.33%| 1 | 10| | TPCH(_20) | TPCH-Q7 | parquet / none / none | 15.84 | 15.92 | -0.50% | 0.91%| 1.15%| 1 | 10| | TPCH(_20) | TPCH-Q14 | parquet / none / none | 3.29 | 3.31| -0.59% | 3.31%| 1.58%| 1 | 10| | TPCH(_20) | TPCH-Q22 | parquet / none / none | 2.65 | 2.67| -0.78% | 3.03%| 1.46%| 1 | 10| | TPCH(_20) | TPCH-Q15 | parquet / none / none | 4.50 | 4.55| -1.19% | 2.87%| 2.45%| 1 | 10| | TPCH(_20) | TPCH-Q20 | parquet / none / none | 3.84 | 3.91| -1.76% | 2.20%| 1.94%| 1 | 10| | TPCH(_20) | TPCH-Q10 | parquet / none / none | 5.58 | 5.70| -2.00% | 1.01%| 1.79%| 1 | 10| | TPCH(_20) | TPCH-Q21 | parquet / none / none | 22.84 | 23.42 | -2.47% | 0.68%| 0.56%| 1 | 10| | TPCH(_20) | TPCH-Q1 | parquet / none / none | 11.25 | 11.60 | -3.06% | 0.48%| 1.74%| 1 | 10| | TPCH(_20) | TPCH-Q12 | parquet / none / none | 3.81 | 3.98| -4.38% | 1.62%| 1.14%| 1 | 10| | TPCH(_20) | TPCH-Q6 | parquet / none / none | 1.94 | 2.04| -4.85% | 2.40%| 1.58%| 1 | 10| +---+--+---++-++++-+---+ ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/4326/3/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS3, Line 300: ExprValuesHash > CurExprValuesHash() [see below for motiviation] Done PS3, Line 303: SetExprValuesHash > Let's rename this to more closely match cur_expr_values() rather than match Done PS3, Line 309: cur_expr_values_null > comment explaining that there is one byte per null value (i.e. these are us Done. I'm not actually sure why we need this for codegen, since bool and uint8_t should both be represented as int8 internally in LLVM, but I don't want to get distracted digging into that, so I left a TODO. There's some weirdness with vector where it uses bitfields but otherwise C++ bools should be 8 bits - maybe that was the reason. PS3, Line 403: in > the first time I read this, I read it as this functions stores the values t Makes sense - done. PS3, Line 403: in > same Done PS3, Line 413: in > same Done PS3, Line 431: This will be replaced by : /// codegen. > nit: move this sentence to end of the comment so it's standardized with sim Done PS3, Line 438: with 'cur_expr_values_' > "to compare against the current row." (since it also uses cur_expr_values_ Done -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 4: Code-Review+1 Carry Michael's +1 -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4122: qgen: fix bitrotted cluster unit tests
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4122: qgen: fix bitrotted cluster unit tests .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e855e265ae245ebe3691d077284ac5761909e00 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4326 to look at the new patch set (#5). Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. IMPALA-4008: don't bake in hash table and hash join pointers This fixes some of the cases where memory addresses are baked into codegen'd code. Testing: Ran exhaustive build. Perf: Ran a local perf run. No significant changes. I was able to see some small improvements on microbenchmarks. +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH(_20) | parquet / none / none | 9.07| +0.46% | 5.88 | +0.34% | +---+---+-++++ +---+--+---++-++++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +---+--+---++-++++-+---+ | TPCH(_20) | TPCH-Q2 | parquet / none / none | 2.12 | 1.89| +12.29% | * 10.85% * | * 20.30% * | 1 | 10| | TPCH(_20) | TPCH-Q13 | parquet / none / none | 9.84 | 9.34| +5.39% | 9.01%| 3.79%| 1 | 10| | TPCH(_20) | TPCH-Q17 | parquet / none / none | 14.61 | 14.19 | +2.97% | 2.15%| 1.52%| 1 | 10| | TPCH(_20) | TPCH-Q18 | parquet / none / none | 14.76 | 14.35 | +2.82% | 3.20%| 2.64%| 1 | 10| | TPCH(_20) | TPCH-Q9 | parquet / none / none | 13.72 | 13.54 | +1.30% | 1.75%| 0.70%| 1 | 10| | TPCH(_20) | TPCH-Q8 | parquet / none / none | 5.71 | 5.64| +1.30% | 1.21%| 1.23%| 1 | 10| | TPCH(_20) | TPCH-Q19 | parquet / none / none | 47.35 | 46.75 | +1.28% | 2.39%| 1.88%| 1 | 10| | TPCH(_20) | TPCH-Q5 | parquet / none / none | 4.57 | 4.52| +1.20% | 1.30%| 0.88%| 1 | 10| | TPCH(_20) | TPCH-Q16 | parquet / none / none | 2.07 | 2.05| +1.12% | 2.59%| 1.79%| 1 | 10| | TPCH(_20) | TPCH-Q11 | parquet / none / none | 1.45 | 1.45| +0.15% | 2.69%| 2.06%| 1 | 10| | TPCH(_20) | TPCH-Q3 | parquet / none / none | 4.65 | 4.65| -0.09% | 2.12%| 2.17%| 1 | 10| | TPCH(_20) | TPCH-Q4 | parquet / none / none | 3.22 | 3.23| -0.26% | 1.03%| 1.33%| 1 | 10| | TPCH(_20) | TPCH-Q7 | parquet / none / none | 15.84 | 15.92 | -0.50% | 0.91%| 1.15%| 1 | 10| | TPCH(_20) | TPCH-Q14 | parquet / none / none | 3.29 | 3.31| -0.59% | 3.31%| 1.58%| 1 | 10| | TPCH(_20) | TPCH-Q22 | parquet / none / none | 2.65 | 2.67| -0.78% | 3.03%| 1.46%| 1 | 10| | TPCH(_20) | TPCH-Q15 | parquet / none / none | 4.50 | 4.55| -1.19% | 2.87%| 2.45%| 1 | 10| | TPCH(_20) | TPCH-Q20 | parquet / none / none | 3.84 | 3.91| -1.76% | 2.20%| 1.94%| 1 | 10| | TPCH(_20) | TPCH-Q10 | parquet / none / none | 5.58 | 5.70| -2.00% | 1.01%| 1.79%| 1 | 10| | TPCH(_20) | TPCH-Q21 | parquet / none / none | 22.84 | 23.42 | -2.47% | 0.68%| 0.56%| 1 | 10| | TPCH(_20) | TPCH-Q1 | parquet / none / none | 11.25 | 11.60 | -3.06% | 0.48%| 1.74%| 1 | 10| | TPCH(_20) | TPCH-Q12 | parquet / none / none | 3.81 | 3.98| -4.38% | 1.62%| 1.14%| 1 | 10| | TPCH(_20) | TPCH-Q6 | parquet / none / none | 1.94 | 2.04| -4.85% | 2.40%| 1.58%| 1 | 10| +---+--+---++-++++-+---+ ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/4326/4/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 308: uint8_t* ALWAYS_INLINE cur_expr_values() const { return cur_expr_values_; } > let's add a short comment here too: Done PS4, Line 310: . > for the current row. Done PS4, Line 400: into > This one should have stayed "in" (or "of"), right? i.e. it doesn't write *e Done Line 427: uint32_t HashVariableLenRow(uint8_t* expr_values, uint8_t* expr_values_null) const; > how about making these const as well. Done Line 441: TupleRow* build_row, uint8_t* expr_values, uint8_t* expr_values_null) const; > these too could be const to clarify they are "in" params. Done. Also made the various TupleRow arguments const too for consistency. This all led to a few additional consts. -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 6: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4118: extract encryption utils from BufferedBlockMgr
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-4118: extract encryption utils from BufferedBlockMgr .. IMPALA-4118: extract encryption utils from BufferedBlockMgr As groundwork for IMPALA-4118, extract encryption and integrity verification routines into a separate module that can be used by the new BufferPool. Simplify the logic in BufferedBlockMgr by not distinguishing between the integrity check and encryption options, which are controlled by the same command line flag anyway. This patch changes how the OpenSSL RNG is seeded. I consulted with the original author of the code (Michael Yoder), who suggested that the new approach would be appropriate: "I'd pick whatever implementation is easiest for you. You could add entropy once per query, or once every X keys (100 keys seems reasonable), or once every "convenient place". 4kb is probably overkill, you could use 512b for example." Testing: Added some unit tests for the utilities. We already have coverage of the BufferedBlockMgr with encryption in buffered-block-mgr-test. Also reduce the number of iterations in the buffered-block-mgr-test to save some testing time. Change-Id: Ibebe431f69c3c569cbff68171beaa32ef2ab03b6 --- M be/src/common/init.cc M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/util/CMakeLists.txt A be/src/util/openssl-util-test.cc A be/src/util/openssl-util.cc A be/src/util/openssl-util.h 8 files changed, 461 insertions(+), 209 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/4389/6 -- To view, visit http://gerrit.cloudera.org:8080/4389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibebe431f69c3c569cbff68171beaa32ef2ab03b6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3873 to look at the new patch set (#15). Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder The main outcome of this patch is to split out PhjBuilder from PartitionedHashJoinNode, which manages the build partitions for the join and implements the DataSink interface. A lot of this work is fairly mechanical refactoring: dividing the state between the two classes and duplicating it where appropriate. One major change is that probe partitions need to be separated from build partitions. This required some significant algorithmic changes to memory management for probe partition buffers: memory management for probe partitions was entangled with the build partitions: small buffers were allocated for the probe partitions even for non-spilling joins and the join could spill additional partitions during probing if the probe partitions needed to switch from small to I/O buffers. The changes made were: - Probe partitions are only initialized after the build is partitioned, and only for spilled build partitions. - Probe partitions never use small buffers: once the initial write buffer is allocated, appending to the probe partition never fails. - All probe partition allocation is done after partitioning the build and before processing the probe input during the same phase as hash table building. (Aside from NAAJ partitions which are allocated upfront). The probe partition changes necessitated a change in BufferedTupleStream: allocation of write blocks is now explicit via the PrepareForWrite() API. Testing: Ran exhaustive build and local stress test. Memory Usage: Ran stress test binary search locally for TPC-DS SF-1 and TPC-H SF-20. No regressions on TPC-DS. TPC-H either stayed the same or improved in min memory requirement without spilling, but the min memory requirement with spilling regressed in some cases. I investigated each of the significant regressions on TPC-H and determined that they were all due to exec nodes racing for spillable or non-spillable memory. None of them were cases where exec nodes got their minimum reservation and failed to execute the spilling algorithm correctly. Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb --- M .clang-format M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/analytic-eval-node.cc M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/hash-join-node.cc M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-aggregation-node.cc A be/src/exec/partitioned-hash-join-builder-ir.cc A be/src/exec/partitioned-hash-join-builder.cc A be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/stress/concurrent_select.py 27 files changed, 2,196 insertions(+), 1,478 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/3873/15 -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 14: (53 comments) http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 207: state, Substitute(PREPARE_FAILED_ERROR_MSG, "write")); > old code and agg uses state_->block_mgr()->MemLimitTooLowError(). Any reas Historical reasons it looks like: the PrepareForRead() version below diverged from the PAGG/PHJ versions. Switched to the block_mgr version for consistency. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: PS14, Line 297: SCOPED_TIMER(build_timer_); > We used to have two different timers in PHJ for measuring the time to hash It's not clear to me that the original timers were well thought out. It didn't look like there was a timer for how long it took to build the hash table - that was lumped into 'build_timer_'. I added two new timers with clearer meanings that track the total hash table build timer and time spent partitioning rows. Also moved the remaining build-related timers into the Builder. Now 'built_timer_' is the initial partitioning and hash table build, and 'repartition_timer_' is the time spent repartitioning'. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 179: largest_fraction = max(largest_fraction, > DCHECK_GT(num_build_rows, 0); I don't think that's valid, it's casting to double to avoid crashing on the divide-by-zero. I changed it to avoid the calculation if num_build_rows = 0 to avoid that problem. PS14, Line 318: a hash table > hash tables Done. Both alternatives seem grammatically correct to me though. Line 341: // building hash tables to avoid building hash tables for partitions we'll spill anyway. > That's an interesting comment. If I understand it correctly, it means that Done with some slight tweaks. This is a lot simpler than the previous behaviour, where spilling can happen even during probing. I didn't measure how often, but it's not at all improbable, since the probe buffers are 8MB each and we could allocate up to 16 of them. Line 401: RETURN_IF_ERROR(SpillPartition()); > May help to document that failure to find any partition to spill (e.g. all I added to the comment up the top of the loop to make the termination more explicit. PS14, Line 529: PhjBuilder::HashTableStoresNulls() > This seems to belong to PartitionedHashJoinNode conceptually. I feel like it makes sense here though since the builder owns the hash tables. Plumbing-wise it's also easier since PhjBuilder needs this info and starts executing before PartitionedHashJoinNode. Line 646: do { > Please see comments in BlockingJoinNode, it would be great to retain timer Done PS14, Line 782: process_build_batch_fn_level0 > 'process_build_batch_fn_level0_' I just copied this verbatim - it seems to be referring to the local variable though. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: Line 425: > nit: unnecessary blank line ? Done http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: PS14, Line 151: : : > This may be important information to retain. Not sure why I removed it. May have had a good reason but I can't recall it. PS14, Line 87: Expr::CreateExprTree(pool_, eq_join_conjunct.right > Does this overlap with CreateExprTree() in PhjBuilder::Init() ? Same questi They are redundant currently, but the idea is that PhjBuilder and PhjNode will exist more independently in different plan fragments, so I wanted to encapsulate the state where possible. The expr contexts need to be independent to eventually support many probe sides : 1 build side for broadcast joins. I think documenting this explicitly may be confusing because it is sort-of explaining the current state of things relative to the previous way of doing things. Whereas I think with the separate build side, the default assumption is that builder data structures are private and not shared with the exec node. It may make sense to share Expr trees globally between fragment instances as part of the multithreading changes, but I don't think it's worth trying to share them here until we have the right infrastructure. Line 213: for (unique_ptr& partition : spilled_partitions_) > missing { } Ah, missed that clang-format wrapped it. PS14, Line 494: to output > outputting Done PS14, Line 585: hash_partitions_ > 'hash_partitions_' Done Line 594: continue; > Is there a reason why we cannot call OutputUnmatchedBuild() directly at thi I don't
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3873 to look at the new patch set (#16). Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder The main outcome of this patch is to split out PhjBuilder from PartitionedHashJoinNode, which manages the build partitions for the join and implements the DataSink interface. A lot of this work is fairly mechanical refactoring: dividing the state between the two classes and duplicating it where appropriate. One major change is that probe partitions need to be separated from build partitions. This required some significant algorithmic changes to memory management for probe partition buffers: memory management for probe partitions was entangled with the build partitions: small buffers were allocated for the probe partitions even for non-spilling joins and the join could spill additional partitions during probing if the probe partitions needed to switch from small to I/O buffers. The changes made were: - Probe partitions are only initialized after the build is partitioned, and only for spilled build partitions. - Probe partitions never use small buffers: once the initial write buffer is allocated, appending to the probe partition never fails. - All probe partition allocation is done after partitioning the build and before processing the probe input during the same phase as hash table building. (Aside from NAAJ partitions which are allocated upfront). The probe partition changes necessitated a change in BufferedTupleStream: allocation of write blocks is now explicit via the PrepareForWrite() API. Testing: Ran exhaustive build and local stress test. Memory Usage: Ran stress test binary search locally for TPC-DS SF-1 and TPC-H SF-20. No regressions on TPC-DS. TPC-H either stayed the same or improved in min memory requirement without spilling, but the min memory requirement with spilling regressed in some cases. I investigated each of the significant regressions on TPC-H and determined that they were all due to exec nodes racing for spillable or non-spillable memory. None of them were cases where exec nodes got their minimum reservation and failed to execute the spilling algorithm correctly. Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb --- M .clang-format M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/analytic-eval-node.cc M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/hash-join-node.cc M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-aggregation-node.cc A be/src/exec/partitioned-hash-join-builder-ir.cc A be/src/exec/partitioned-hash-join-builder.cc A be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/stress/concurrent_select.py 27 files changed, 2,196 insertions(+), 1,476 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/3873/16 -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 16: Rebased onto master -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 14: (2 comments) Before I responded to individual comments I wanted to point out that the coupling is now one way - the builder never refers to PhjNode aside from metadata provided via the constructor (you can verify there are no references to partitioned-hash-join-node.h or PartitionedHashJoin node in the builder code). At this point it should only require minor changes to this hash join code to create the builder in a separate fragment *before* the join node, provided you have a 1:1 mapping from builder to node. 1:n broadcast joins with no spilling would require some more changes, but is a lot closer. I chose this stopping point because there was a clear physical separation between the classes, even if the spilling algorithm has some logical coupling and assumes a 1:1 mapping from node to builder. Reducing the logical coupling would require significant changes to the spilling algorithm that is driven from PartitionedHashJoin::GetNext(), which I think should be deferred until there is some consensus about what spilling for 1:n broadcast joins with multithreading might look like. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: Line 123: bool HashTableStoresNulls() const; > why is this here rather than the join node? especially given that the join Both classes already have their own hash table context. Line 128: inline const std::vector& is_not_distinct_from() const { > and then should this be here? is it to be closer to being able to break the There isn't a parent link - all the necessary metadata about the join is passed in when the builder is initialised. -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3308: Get expr-test passing on PPC64LE
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3308: Get expr-test passing on PPC64LE .. Patch Set 2: Any luck with this? -- To view, visit http://gerrit.cloudera.org:8080/4186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4504ee6a52a085f530aadfcfa009bacb83c64787 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: segelyang Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: segelyang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4350/3/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 99: bool BlockingPut(const T& val) { > Yes, I have thought about it before. I have tested various locations for no We discussed this a little offline and I feel like I understand why option (3) works in practice now - when N producers are outpacing consumers, this switches to a mode where the steady state is a nearly empty queue with k producers sitting blocked and N - k producers working. Essentially there are some additional elements in the queue because the blocked producers also have some row batches ready to add to the queue immediately. I'm ok moving forward with it so long as we document the expected behaviour. Perhaps it's adequate to document the expect behaviour with a faster consumer, matched producer/consumer and fast producers. Have you benchmarked this with concurrent queries or workloads with many concurrent scans? One concern with the empty queue is that if the queue is empty and a producer isn't scheduled immediately, the consumer may end up waiting on the condition variable for every batch and potentially getting swapped it. We could perhaps work around that if the producer added its item to the queue *before* blocking, so that the the consumer can get the item without requiring a context switch. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4074: Configuration items duplicate in template of YARN
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4074: Configuration items duplicate in template of YARN .. Patch Set 1: I'm going to run tests and merge this. -- To view, visit http://gerrit.cloudera.org:8080/4311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81d6d019d4982cb35932b1d45c376b215ec5bcc6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yuanhao Luo Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yuanhao Luo Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4074: Configuration items duplicate in template of YARN
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4074: Configuration items duplicate in template of YARN .. Patch Set 2: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/4311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81d6d019d4982cb35932b1d45c376b215ec5bcc6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yuanhao Luo Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yuanhao Luo Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4008: Don't bake ExprContext pointers into PAGG/AGG IR code
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: Don't bake ExprContext pointers into PAGG/AGG IR code .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 84: // Use NULL if the aggregate evaluator is not codegen'ed or if there is no It might be simpler just to always set to NULL if input_expr_ctxs is empty. We lose some cross-validation but I feel like the simple code would be nice. http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 82: std::vector agg_expr_ctxs_; Mention that the ExprContexts are owned by aggregate_evaluators_. http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 158: ExprContext* agg_expr_ctx = NULL; Again, I think it would be simpler to just check if it's empty. Or add a method to AggFnEvaluator like HasInputExprContexts() http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: Line 202: /// ExprContexts of 'aggregate_evaluators_'. Used in the codegen'ed version of Also document ownership here. Line 205: std::vector agg_expr_ctxs_; What do you think about making this vector>? Each pair is just a struct with two pointers so should be simple to access from the IR. This would better document the correspondence between the two vectors and result in one fewer argument being threaded through everywhere. I'm mostly ok with the current approach but the extra argument is a bit annoying. http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 192: std::vector input_expr_ctxs_; Can you add your comment here about when this is empty/non-empty? -- To view, visit http://gerrit.cloudera.org:8080/4390 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42039eed803a39fa716b9ed647510b6440974ae5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/4428/1/be/src/runtime/row-batch-test.cc File be/src/runtime/row-batch-test.cc: Line 38: vector nullable_tuples(1, false); This is ok, but I find the c++11 initialisers for vectors a bit nicer. e.g. nullable_tuples({false}) PS1, Line 39: TTupleId This should be a static_cast according to our coding standard. Line 60: ASSERT_DEBUG_DEATH(bad_dest.AcquireState(&src), ""); > Will change to IMPALA_ASSERT_DEBUG_DEATH when related patch lands. We should make sure to merge that first, otherwise we will have a lot of core dumps. http://gerrit.cloudera.org:8080/#/c/4428/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: Line 319: capacity_ = tuple_ptrs_size_ / (num_tuples_per_row_ * sizeof(Tuple*)); We already know the initial capacity using this calculation - we should compute it the same way in both places. Maybe just factor this out into an InitialCapacity() function? -- To view, visit http://gerrit.cloudera.org:8080/4428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3873 to look at the new patch set (#17). Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder The main outcome of this patch is to split out PhjBuilder from PartitionedHashJoinNode, which manages the build partitions for the join and implements the DataSink interface. A lot of this work is fairly mechanical refactoring: dividing the state between the two classes and duplicating it where appropriate. One major change is that probe partitions need to be separated from build partitions. This required some significant algorithmic changes to memory management for probe partition buffers: memory management for probe partitions was entangled with the build partitions: small buffers were allocated for the probe partitions even for non-spilling joins and the join could spill additional partitions during probing if the probe partitions needed to switch from small to I/O buffers. The changes made were: - Probe partitions are only initialized after the build is partitioned, and only for spilled build partitions. - Probe partitions never use small buffers: once the initial write buffer is allocated, appending to the probe partition never fails. - All probe partition allocation is done after partitioning the build and before processing the probe input during the same phase as hash table building. (Aside from NAAJ partitions which are allocated upfront). The probe partition changes necessitated a change in BufferedTupleStream: allocation of write blocks is now explicit via the PrepareForWrite() API. Testing: Ran exhaustive build and local stress test. Memory Usage: Ran stress test binary search locally for TPC-DS SF-1 and TPC-H SF-20. No regressions on TPC-DS. TPC-H either stayed the same or improved in min memory requirement without spilling, but the min memory requirement with spilling regressed in some cases. I investigated each of the significant regressions on TPC-H and determined that they were all due to exec nodes racing for spillable or non-spillable memory. None of them were cases where exec nodes got their minimum reservation and failed to execute the spilling algorithm correctly. Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb --- M .clang-format M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/analytic-eval-node.cc M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/hash-join-node.cc M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-aggregation-node.cc A be/src/exec/partitioned-hash-join-builder-ir.cc A be/src/exec/partitioned-hash-join-builder.cc A be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/stress/concurrent_select.py 27 files changed, 2,244 insertions(+), 1,491 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/3873/17 -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 14: (16 comments) http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: PS14, Line 45: build row partitions > is this the same as the "hash partitions" referred to in the next paragraph really the streams, reworded. Line 86: > is it expected that the renaming public functions would only be used by PHJ Done PS14, Line 90: Acquire ownership > Doesn't the ownership pass from this class to the exec node? In which case It passes from the class to the caller. I guess I don't find this ambiguous given the return type and can't think of a non-awkward alternative. PS14, Line 95: Called after probing of the partitions : /// is done. The partitions are not closed or destroyed, since they may be spilled or : /// may contain unmatched build rows for certain join modes (e.g. right outer join). > The fact that we need some much explanation kinda makes this seem like it m The hash join node uses this to determine whether it's already cleaned up the hash partitions, and to keep the build/probe hash partition vectors in sync. Agree this isn't the interface we ultimately want but I wanted to avoid additional restructuring of GetNext()/CleanupHashPartitions(). Line 106: } > nit: missing blank line Done PS14, Line 109: When this call returns > Is this stuff by this method or the caller? i.e. if this method closes inp Done PS14, Line 112: so that the probe phase has enough buffers preallocated : /// execute successfully. > slightly garbled. seems like a word is missing or something. Done PS14, Line 117: /// Iterates over all the partitions in 'hash_partitions_' and returns the largest : /// number of build rows. > rather than say how and referring to private member, and leaving it to the Done Line 123: bool HashTableStoresNulls() const; > Both classes already have their own hash table context. Also, I think this makes sense to have here since the builder manages the hash tables. PS14, Line 182: partition side of this partition > not sure what this is saying. maybe just "for this build partition"? Done Line 188: /// false. > maybe clarify that an error is not returned in that case. or clarify what Done PS14, Line 195: unpin_all_build > why "build"? shouldn't it be unpin_all_blocks? or even better would be to Overall these bool arguments are really confusing, and the correctness of the spilling depends a lot on using the right value in the right place. I ended up changing this significantly to use an enum all the way down to the underlying BufferedTupleStream method, and adding this to SpillPartition(). It turns out there was a bug in BuildHashTablesAndPrepareProbeStreams() where it should have been unpinning all of the blocks in the partition it chose to spill but wasn't. I'm rerunning the memory experiment just to confirm this doesn't change anything. PS14, Line 234: build or probe > does this care about the probe side partitioning? Yes, since we share the reservation and hand off the probe buffers. PS14, Line 244: all hash partitions for partitioning level > this is kinda misleading. maybe qualify with ... for the current input, or Done PS14, Line 252: in > into Done PS14, Line 256: in memory > what does this mean? is it trying to say it fits in the current (pinned) bl Reworded it - I think it makes more sense to think of it as whether appending to the stream succeeds (which may advance the block) or whether we have to start spilling things. -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4428/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: Line 319: capacity_ = tuple_ptrs_size_ / (num_tuples_per_row_ * sizeof(Tuple*)); > What about just setting capacity_ = initial_capacity_ here? I don't feel strongly either way, but when this added I switched from the initial_capacity_ approach to the current approach to avoid redundant state: https://gerrit.cloudera.org/#/c/1399/8/be/src/runtime/row-batch.h@301 -- To view, visit http://gerrit.cloudera.org:8080/4428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 17: The memory usage experiment results look good (no major changes) -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4428/2/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: PS2, Line 282: MarkCapacity MarkAtCapacity I think -- To view, visit http://gerrit.cloudera.org:8080/4428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4353 to look at the new patch set (#5). Change subject: IMPALA-4111: backend death tests should not produce minidumps .. IMPALA-4111: backend death tests should not produce minidumps Move the existing core dump disabler into a shared utility header and also disable minidumps too. Add a macro to simplify use of the disabler. Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 --- A be/src/testutil/death-test-util.h M be/src/util/minidump.cc M be/src/util/minidump.h M be/src/util/promise-test.cc 4 files changed, 83 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4353/5 -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4353 to look at the new patch set (#6). Change subject: IMPALA-4111: backend death tests should not produce minidumps .. IMPALA-4111: backend death tests should not produce minidumps Move the existing core dump disabler into a shared utility header and also disable minidumps too. Add a macro to simplify use of the disabler. Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 --- A be/src/testutil/death-test-util.h M be/src/util/minidump.cc M be/src/util/minidump.h M be/src/util/promise-test.cc 4 files changed, 83 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4353/6 -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 6: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3308: Get expr-test passing on PPC64LE
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3308: Get expr-test passing on PPC64LE .. Patch Set 3: (2 comments) No problems, just wanted to make sure that this patch made it it. http://gerrit.cloudera.org:8080/#/c/4186/3/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: Line 70: /// Init for string values Can you remove this comment? It's not helpful (I know you didn't add it :)). Line 71: void Init(const std::string& str){ Can you fix the missing whitespace in this line and the one below. Probably the easiest is to run clang-format on the patch (we recently added a .clang-format file). This command will run clang-format on the lines changed in the most recent commit on your branch: git diff -U0 HEAD^ | $IMPALA_TOOLCHAIN/llvm-3.8.0-p1/share/clang/clang-format-diff.py -i -p1 -- To view, visit http://gerrit.cloudera.org:8080/4186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4504ee6a52a085f530aadfcfa009bacb83c64787 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: segelyang Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: segelyang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue .. Patch Set 4: (8 comments) I'm ok with this option if it generally provides better perf and avoids pathological behaviour. Generally looks good, just comment and style comments. http://gerrit.cloudera.org:8080/#/c/4350/4/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: Line 36: /// if the queue is empty or full, respectively. I think we should clarify when the unblocking happens. In BlockingGet() it is unblocked as soon as an item is added, but in BlockingPut() it is not guaranteed (since we're not calling put_cv_.NotifyOne() with the lock held). Line 95: put_cv_.NotifyOne(); Can you comment that this notification is racy: producers may not be notified if the queue is partially empty. Line 182: /// Maximum number of elements in 'get_list_' + 'put_list_'. clarify as "Maximum total number" - if a reader wasn't careful they might thing it meant the maximum per list. PS4, Line 190: > > nit: we're generally using >> now that it works in c++11 http://gerrit.cloudera.org:8080/#/c/4350/4/be/src/util/condition-variable.h File be/src/util/condition-variable.h: Line 29: /// This has lower overhead than boost's implementation as it Nit: lines are wrapped very short here PS4, Line 31: CACHE_ALIGNED Nit: here and in the other places: does it work to put this on the line before 'class'? That seems a little more readable to me. Line 43: bool inline TimedWait(boost::unique_lock& lock, What does the return value mean? Line 50: void inline NotifyOne() { pthread_cond_signal(&cv_); } Nit: here and above, we usually put 'inline' before the return type. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Chen Huang Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3308: Get expr-test passing on PPC64LE
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3308: Get expr-test passing on PPC64LE .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4186/3/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: PS3, Line 73: string_val.ptr = const_cast(string_data.data()); > Does this have the possibility of creating Undefined Behavior? The c++11 standard guarantees that string storage is contiguous. Should be fine as long as the string isn't modified later. Let's change this to string_val.ptr = &string_data[0] to avoid the const_cast. -- To view, visit http://gerrit.cloudera.org:8080/4186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4504ee6a52a085f530aadfcfa009bacb83c64787 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: segelyang Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: segelyang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3308: Get expr-test passing on PPC64LE
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3308: Get expr-test passing on PPC64LE .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4186/3/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: PS3, Line 73: string_val.ptr = const_cast(string_data.data()); > I think string_data[0] has type const char&, so I'm not sure that will work There's a non-const char version http://www.cplusplus.com/reference/string/string/operator[]/. This function should be the only place that string_data is touched. -- To view, visit http://gerrit.cloudera.org:8080/4186 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4504ee6a52a085f530aadfcfa009bacb83c64787 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: segelyang Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: segelyang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 17: (6 comments) http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: PS14, Line 90: > I guess one term used elsewhere is "release", right? (e.g. C++ unique_ptr). I find release confusing, makes me think that the streams are being "released" from the join as a whole. I changed it to Transfer() since that seems more neutral about the directionality. http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 372: { RETURN_IF_ERROR(build_partition->BuildHashTable(&built)); } > extraneous braces Done Line 395: } > same Done http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS15, Line 81: build > probe Done PS15, Line 405: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: Line 391: } > alternatively: initialize insert_pos to begin and then DCHECK_NE(insert_pos I think I'll stay with this approach. The alternative seems a little subtle (it depends on us adding + 1 to it above) -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 18: Code-Review+1 Carry +_1 -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3873 to look at the new patch set (#18). Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder The main outcome of this patch is to split out PhjBuilder from PartitionedHashJoinNode, which manages the build partitions for the join and implements the DataSink interface. A lot of this work is fairly mechanical refactoring: dividing the state between the two classes and duplicating it where appropriate. One major change is that probe partitions need to be separated from build partitions. This required some significant algorithmic changes to memory management for probe partition buffers: memory management for probe partitions was entangled with the build partitions: small buffers were allocated for the probe partitions even for non-spilling joins and the join could spill additional partitions during probing if the probe partitions needed to switch from small to I/O buffers. The changes made were: - Probe partitions are only initialized after the build is partitioned, and only for spilled build partitions. - Probe partitions never use small buffers: once the initial write buffer is allocated, appending to the probe partition never fails. - All probe partition allocation is done after partitioning the build and before processing the probe input during the same phase as hash table building. (Aside from NAAJ partitions which are allocated upfront). The probe partition changes necessitated a change in BufferedTupleStream: allocation of write blocks is now explicit via the PrepareForWrite() API. Testing: Ran exhaustive build and local stress test. Memory Usage: Ran stress test binary search locally for TPC-DS SF-1 and TPC-H SF-20. No regressions on TPC-DS. TPC-H either stayed the same or improved in min memory requirement without spilling, but the min memory requirement with spilling regressed in some cases. I investigated each of the significant regressions on TPC-H and determined that they were all due to exec nodes racing for spillable or non-spillable memory. None of them were cases where exec nodes got their minimum reservation and failed to execute the spilling algorithm correctly. Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb --- M .clang-format M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/analytic-eval-node.cc M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/hash-join-node.cc M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-aggregation-node.cc A be/src/exec/partitioned-hash-join-builder-ir.cc A be/src/exec/partitioned-hash-join-builder.cc A be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/stress/concurrent_select.py 27 files changed, 2,242 insertions(+), 1,491 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/3873/18 -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. Patch Set 19: (2 comments) http://gerrit.cloudera.org:8080/#/c/5161/18/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 57: private final ArrayList aggFnArgTypes_; > Is this change a nice add-on or is it required for this patch? The UDA codegen and FunctionContext::Get*Type() semantic fixes are somewhat independent but touch the same code so it was easiest to combine them. This is needed for GetArgType() to return the correct input expr time. We discussed the approach offline. It's the most sane way to plumb it through (otherwise the merge aggregation has to go and search through the plan tree for the original AggregateInfo when it is generating the thrift descriptors, which would violate multiple abstractions). Alex found an example where the types do indeed get out of sync: select typeof(sum(f + d)) from (select float_col f, 1.2 d from functional.alltypes) v; However, in this case (and all cases I can think of) , the intermediate and output tuple descriptors are also inconsistent, which is detected by a precondition: ERROR: IllegalStateException: Agg expr sum(float_col + 1.2) returns type DOUBLE but its output tuple slot has type DECIMAL(38,9) I added an additional precondition to check for the types getting out of sync. I verified that it's hit if I comment out the earlier preconditions: ERROR: IllegalStateException: Agg expr sum:merge(f + d) arg type DECIMAL(38,9) differs from input expr type DOUBLE in original expr sum(float_col + 1.2) So we need to fix the underlying bug still (which actually stems from the decimal type resolution algorithm) but the behaviour won't be a regression. Line 74: private FunctionCallExpr(FunctionName fnName, FunctionParams params, > don't need this anymore, right? Done -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Hello Impala Public Jenkins, Michael Ho, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5161 to look at the new patch set (#19). Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs This uses the existing infrastructure for codegening builtin UDAs and for codegening calls to UDFs. GetUdf() is refactored to support both UDFs and UDAs. IR UDAs are still not allowed by the frontend. It's unclear if we want to enable them going forward because of the difficulties in testing and supporting IR UDFs/UDAs. This also fixes some bugs with the Get*Type() methods of FunctionContext. GetArgType() and related methods now always return the logical input types of the aggregate function. Getting the tests to pass required fixing IMPALA-4878 because they called GetIntermediateType(). Testing: test_udfs.py tests UDAs with codegen enabled and disabled. Added some asserts to test UDAs to check that the correct types are passed in via the FunctionContext. Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/timestamp-functions.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udas.h M be/src/udf/udf-internal.h M be/src/udf/udf-ir.cc M be/src/udf/udf.h M common/thrift/Exprs.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 26 files changed, 496 insertions(+), 237 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/5161/19 -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner .. IMPALA-3748: Part 1: Clean up resource estimation in planner This is in preparation to use this code to compute the minimum reservation. The cleanup restructures the code slightly so that it's clearer whether resource estimates are valid or invalid. It also removes the unused VCore estimates. Fixes various bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading * The logic does not work for mt_dop > 1 because it conflates fragment instances with hosts. Fixing this requires identifying places where we want per-instance estimates instead of per-host. I left one bug unfixed because it is subtle and will be easier to review in isolation: IMPALA-4862. There is some remaining questionable behaviour I left untouched: * It's unclear why unpartitioned fragments are always excluded from total memory consumption. * Many operators do not have estimates or have questionable heuristics. Testing: Re-enabled the explain_level tests, which appear to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests and a number of planner test cases for memory estimates in both the MT and non-MT cases. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 --- M be/src/service/query-exec-state.cc M common/thrift/Frontend.thrift M docs/shared/impala_common.xml M docs/topics/impala_explain.xml M docs/topics/impala_explain_level.xml M docs/topics/impala_explain_plan.xml M docs/topics/impala_hbase.xml M docs/topics/impala_known_issues.xml M docs/topics/impala_optimize_partition_key_scans.xml M docs/topics/impala_perf_joins.xml M docs/topics/impala_perf_stats.xml M fe/src/main/java/org/apache/impala/common/PrintUtils.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test A testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M tests/custom_cluster/test_admission_controller.py M tests/metadata/test_explain.py 47 files changed, 1,544 insertions(+), 1,454 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/5 -- To view, visit http://gerrit.cloudera.org:8080/5847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner .. Patch Set 4: (5 comments) Some minor cleanup and tests fixes http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 51: // Default per-host memory requirement used if no valid stats are available. > Need to fix comment. Done http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: Line 85: long perInstanceInputCardinality = Math.max(1L, inputNode.getCardinality() / numInstances); > Need to fix long lines Done Line 138: output.append(PrintUtils.printInstances(" ", fragment_.getNumInstances(queryOptions))); > Need to fix long line Done http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java: Line 62: output.append(PrintUtils.printInstances(" ", fragment_.getNumInstances(queryOptions))); > need to fix long line Done http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: Line 213: return dop * planRoot_.getNumNodes(); > Needs to handle when getNumNodes() is invalid, i.e. -1 Done -- To view, visit http://gerrit.cloudera.org:8080/5847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4842: BufferedBlockMgrTest.WriteError is flaky
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/5940 Change subject: IMPALA-4842: BufferedBlockMgrTest.WriteError is flaky .. IMPALA-4842: BufferedBlockMgrTest.WriteError is flaky The test should allow Unpin() to fail with a scratch allocation error to handle the case where the first write fails and blacklists the scratch disk around the same time that the second write starts. Usually either the second write succeeds because it started before the first write failed or it fails with CANCELLED because the BufferedBlockMgr::is_cancelled_ flag is set. There is a small window for a race after the disk is blacklisted in TmpFileMgr but before BufferedBlockMgr::WriteComplete() is called. Testing: I was able to reproduce the problem locally by adding some delays to the test. I added a variant of the WriteError test that more reliably reproduces the bug. Ran both WriteError tests in a loop locally to try to flush out flakiness. Change-Id: I9878d7000b03a64ee06c2088a8c30e318fe1d2a3 --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M common/thrift/generate_error_codes.py 3 files changed, 43 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5940/1 -- To view, visit http://gerrit.cloudera.org:8080/5940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9878d7000b03a64ee06c2088a8c30e318fe1d2a3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4895: Memory limit exceeded in test outer joins
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4895: Memory limit exceeded in test_outer_joins .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5941/1/testdata/workloads/tpch/queries/tpch-outer-joins.test File testdata/workloads/tpch/queries/tpch-outer-joins.test: Line 35: none Can we make it something that is unlikely to be a substring of a real error message? Just to avoid confusion. E.g. __NO_ERROR__ or something like that. -- To view, visit http://gerrit.cloudera.org:8080/5941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4731a3e83dd2142a1d83be963f83cd1847472295 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/5161/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 55: // If this is merge aggregation function, the input argument types of the aggregate > I think the behavior and limitations are going to be clearer if we keep a p Doesn't that somehow entangle the lifecycles of the two different sets of exprs? As a general design pattern storing pointers from one expr tree to a separate expr tree that can be mutated in-place seems like a bad idea - too hard to reason about. I could see just storing a clone of the source expression but I'm not sure that that's significantly better than the current approach. -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Hello Impala Public Jenkins, Michael Ho, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5161 to look at the new patch set (#20). Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs This uses the existing infrastructure for codegening builtin UDAs and for codegening calls to UDFs. GetUdf() is refactored to support both UDFs and UDAs. IR UDAs are still not allowed by the frontend. It's unclear if we want to enable them going forward because of the difficulties in testing and supporting IR UDFs/UDAs. This also fixes some bugs with the Get*Type() methods of FunctionContext. GetArgType() and related methods now always return the logical input types of the aggregate function. Getting the tests to pass required fixing IMPALA-4878 because they called GetIntermediateType(). Testing: test_udfs.py tests UDAs with codegen enabled and disabled. Added some asserts to test UDAs to check that the correct types are passed in via the FunctionContext. Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/timestamp-functions.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udas.h M be/src/udf/udf-internal.h M be/src/udf/udf-ir.cc M be/src/udf/udf.h M common/thrift/Exprs.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 26 files changed, 504 insertions(+), 248 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/5161/20 -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/5161/19/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 55: // If this is merge aggregation function, the input argument types of the aggregate > I'm fine with either approach, but: Done -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Hello Impala Public Jenkins, Michael Ho, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5161 to look at the new patch set (#21). Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs This uses the existing infrastructure for codegening builtin UDAs and for codegening calls to UDFs. GetUdf() is refactored to support both UDFs and UDAs. IR UDAs are still not allowed by the frontend. It's unclear if we want to enable them going forward because of the difficulties in testing and supporting IR UDFs/UDAs. This also fixes some bugs with the Get*Type() methods of FunctionContext. GetArgType() and related methods now always return the logical input types of the aggregate function. Getting the tests to pass required fixing IMPALA-4878 because they called GetIntermediateType(). Testing: test_udfs.py tests UDAs with codegen enabled and disabled. Added some asserts to test UDAs to check that the correct types are passed in via the FunctionContext. Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/timestamp-functions.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udas.h M be/src/udf/udf-internal.h M be/src/udf/udf-ir.cc M be/src/udf/udf.h M common/thrift/Exprs.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/query_test/test_udfs.py 26 files changed, 501 insertions(+), 248 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/5161/21 -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/5161/20/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 106: ArrayList aggFnArgTypes = Lists.newArrayList(); > unused? Done Line 554: "in original expr %s", toSql(), copiedInputType.toString(), > also use toSql() for types Done -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4878,IMPALA-4879: codegen native UDAs .. Patch Set 22: Code-Review+2 carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Address additional comments
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2328: Address additional comments .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6147/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 487: DCHECK(min_max_tuple_desc->slots().size() == min_max_conjuncts_ctxs_.size()); DCHECK_EQ -- To view, visit http://gerrit.cloudera.org:8080/6147 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I54c205fad7afc4a0b0a7d0f654859de76db29a02 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4995: Fix integer overflow in TopNNode::PrepareForOutput
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4995: Fix integer overflow in TopNNode::PrepareForOutput .. Patch Set 1: Code-Review+2 This fix looks good but see my comment on the JIRA - it may make sense to switch to the sort operator when limit + offset is large since there are other potential problems with the TopN node in those cases -- To view, visit http://gerrit.cloudera.org:8080/6171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5048ec67d8f086346220d56e027e6583fbb5ddad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 14: (7 comments) Flushing out some comments I made while in transit. I don't have any concerns about correctness but there were a few things that may be confusing for people reading the code. http://gerrit.cloudera.org:8080/#/c/5904/14//COMMIT_MSG Commit Message: Line 7: IMPALA-4624: Implement Parquet dictionary filtering Can you mention the query option in the commit message? PS14, Line 12: incides indices http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 704: return Status(Substitute("Could not allocate buffer of $0 bytes for Parquet " Can you use MemTracker::MemLimitExceeded() here to construct the status? It also does some logging that can be useful to diagnose the failure. http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 855 Thanks for fixing this. I was talking to Wes McKinney (who works on parquet-cpp) a month or so ago and he'd been confused about why Impala was writing out encodings it wasn't using. http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS14, Line 256: GetDictionary GetDictionaryDecoder() for consistency with the other APIs? Line 865: if (!stream_->ReadBytes(data_size, &data_, &status)) return status; Not your change, but can we just SkipBytes here? That would make the intent clearer. http://gerrit.cloudera.org:8080/#/c/5904/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 243: public boolean isDeterministic() { I think we should be careful about the naming and comments here, since this will return true for many non-deterministic functions - the state of things before this patch is pretty confusing. The definition is Expr.isConstant() is subtle - the comment on that function tries to define the current rules. E.g. UDFs can be non-deterministic but the fe treats them as deterministic (for now). Or now() isn't really deterministic, but we treat it as such because it shouldn't be re-evaluated within a query. This list of functions is really the builtin functions that have some kind of randomisation. Can you rename it to something like isRandomizedBuiltin() and update the comment to reflect that? -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 16: It looks like you'll have to do a follow-on patch to solve the bootstrapping problem with the Impala version so I'm ok if you want to address the comments in a follow-on. -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Fix parquet table writer dictionary leak
Tim Armstrong has posted comments on this change. Change subject: Fix parquet table writer dictionary leak .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/6181/1/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 334: OutputPartition* deletable_partition = current_clustered_partition_->first; Why not just delete it here and save the temp variable? Line 336: delete(deletable_partition); delete deletable_partition; - delete is a keyword instead of a function Line 579: OutputPartition* partition = new OutputPartition(); Could we use a unique_ptr here and in PartitionPair to make the ownership explicit and cleanup automatic? I think this would be a unique_ptr, and it would be move()d into the map . Line 587: delete(partition); delete partition; -- To view, visit http://gerrit.cloudera.org:8080/6181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix parquet table writer dictionary leak
Tim Armstrong has posted comments on this change. Change subject: Fix parquet table writer dictionary leak .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6181/2//COMMIT_MSG Commit Message: Line 7: Fix parquet table writer dictionary leak JIRA? http://gerrit.cloudera.org:8080/#/c/6181/1/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 579: OutputPartition* partition = new OutputPartition(); > The issue is that the OutputPartition is passed down to the table writers, The OutputPartition owns the HdfsTableWriter via a scoped_ptr so we can safely use a raw OutputPartition* pointer in HdfsTableWriter. This pattern of an owning unique_ptr/scoped_ptr in one direction and a raw pointer for the inverse shows up elsewhere in the code (e.g. for "parent" pointers). I'd probably add a comment on HdfsTableWriter.output_ to note that ownership relationship but I think that would be clear and safe. -- To view, visit http://gerrit.cloudera.org:8080/6181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4899: Fix parquet table writer dictionary leak
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4899: Fix parquet table writer dictionary leak .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6181/3/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 172: const std::unique_ptr& output_partition, Just passing the raw pointer seems ok to me - I think you can do everything with the const unique_ptr that you can do with a raw pointer, so the type doesn't convey much additional info. I don't feel that strongly about it though -- To view, visit http://gerrit.cloudera.org:8080/6181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4899: Fix parquet table writer dictionary leak
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4899: Fix parquet table writer dictionary leak .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 5: (39 comments) http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS5, Line 181: write_page_->is_pinned() > when do we have an unpinned write page? what does that mean? I documented that they have to be pinned and added some consistency checks. If I set write_page_ = null earlier that keeps it in a consistent state. Line 189: *got_page = buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len); > shouldn't this come with IMPALA-3208? or do we need this already? We need it already for the first page or if got_page was false on an earlier call and we're retrying. Line 198: write_end_ptr_ = new_page.data() + page_len; > would be slightly clearer to compute this based on write_page_ Done Line 234: DCHECK(&*read_page_ != write_page_); > here and above: DCHECK_NE/EQ It doesn't work for the first one since the values of iterators aren't printable. Line 246: bytes_in_mem_ -= read_page_->len(); > UnpinPage()? Done Line 256: if (!read_page_->is_pinned()) { > is this check sufficient? couldn't the page by pinned due to it also being I just kept this logic from the old BufferedTupleStream. Added a comment to 'write_page_' to explain. I think it's easiest to leave as-is for now. Having a pin per iterator makes a lot of sense when we have a separate iterator abstraction that manages its own reservation, but it interacts strangely with the current UnpinStream() API - if you UnpinStream() a read-write stream with a single page then it could actually use more reservation. The only current user -AnalyticEvalNode - shouldn't do that but it adds a weird edge case to be wary of. I'm open to changing it but it wasn't a strict improvement in my eyes. Line 300: bytes_in_mem_ += pages_.front().len(); > maybe factor this into PinPage() to match UnpinPage() Done PS5, Line 324: is_pinned > i guess this use is probably okay I think we'd need to change it if we started pinned pages multiple times - we might need to unpin a page here if it was pinned multiple times. http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: Line 65: // > /// Done PS5, Line 66: In the pinned mode all pages are : /// pinned and the data is fully in memory. > seems redundant with the next sentence Done PS5, Line 74: PrepareForWrite > shouldn't that be PrepareForRead()? Done PS5, Line 74: the > to Done Line 75: /// pass over the stream. > this should go away soon after switching to BufferPool, right? i.e. buffers I think the concept of a destructive read pass will continue to exist, but the pages would be destroyed and buffers attached instead of the current dance with NeedsDeepCopy(). There will probably be other related changes though, e.g. around buffer management for non-destructive read passes. To support non-destructive unpinned read passes we might actually need a new BufferPool API to extract a buffer from a page without destroying the page. Line 121: /// <---12b---> <---12b---> <-5b> > i guess these examples all assume non-nullable? Added a nulllable example Line 127: /// caller does not need to copy if it is streaming. > are these listed in order of precedence? i.e. if a stream is both delete on Reworked this and the memory lifetime section to avoid ambiguity. Line 139: /// or free up other memory before retrying. > nit:indentation of some of these paragraphs is inconsistent Done Line 152: /// layout described above. > maybe a TODO/JIRA for what we had discussed with unifying AddRow() and Allo Done Line 199: ///Returns false and sets 'status' to an error. > nit: indentations of these bullets are inconsistent Done Line 262: /// rows will remain valid until the stream is unpinned, destroyed, etc. > let's also put TODO IMPALA-4179 to make it easier to keep straight of where Done Line 267: /// stream remains pinned. > does this interface require the stream to be pinned? if so, let's be explic Done PS5, Line 370: offset > this isn't really the offset. Done Line 374: uint8_t* read_end_ptr_; > given we store number of rows in the page, what is this used for? oh, i see Done Line 380: uint8_t* write_end_ptr_; > same, and this one looks dead anyway. This one is used in AllocateRow() where the perf matters - it likely saves a few cycles per row. I ended up removing write_page_bytes_remaining() and just inlining the calculations. Line 382: /// Number of rows returned to the caller from GetNext(). > ... since PrepareForRead() was called? Done Line 386: int read_page_idx_; > this looks dead Done Line 393: int num_pinned_; > this doesn't looked
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 2,822 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/6 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] IMPALA-5002: Associate toolchain build scripts/flags with built packages
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5002: Associate toolchain build scripts/flags with built packages .. Patch Set 2: Code-Review+2 (1 comment) Seems like a good thing to track http://gerrit.cloudera.org:8080/#/c/6210/2/functions.sh File functions.sh: Line 351: git rev-parse HEAD > ${BUILD_DIR}/${PACKAGE_STRING}${PATCH_VERSION}/toolchain-build-hash.txt Long line -- To view, visit http://gerrit.cloudera.org:8080/6210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida6f1eee0c20de7fbfdbb716e5c7ac8686b558b6 Gerrit-PatchSet: 2 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[native-toolchain-CR] Add a script to build Kudu using existing toolchain artifacts
Tim Armstrong has posted comments on this change. Change subject: Add a script to build Kudu using existing toolchain artifacts .. Patch Set 2: (5 comments) Looks pretty good, just had some minor comments http://gerrit.cloudera.org:8080/#/c/6167/2/build-kudu-single.sh File build-kudu-single.sh: Line 1: #!/usr/bin/env bash Doesn't matter too much but build-kudu-only seems clearer to me Line 15: Can you add a short comment up the top explaining how to use it and what environment variables influence it. Line 35: function download_toolchain_dependency() { Does this download all versions of the package? Line 110: pkg_name=${dir%*/} Can we call this something instead of pkg_name? Like pkg_string? pkg_name usually means the name without the version in these scripts. http://gerrit.cloudera.org:8080/#/c/6167/2/init-compiler.sh File init-compiler.sh: PS2, Line 20: trailing space -- To view, visit http://gerrit.cloudera.org:8080/6167 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I237580e1545033467a92285ca8bb8db1cf8c804e Gerrit-PatchSet: 2 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/5904/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 243: public boolean isDeterministic() { > Switched this over to isBuiltinRandom/isBuiltinRandomFn. I agree that it is We ran into a similar problem with non-deterministic UDFs when we added expression rewrites. We never provided any guarantees about how non-deterministic UDFs were evaluated, and it was already inconsistent. E.g. partitions have been statically pruned based on UDF output since many releases ago, so we don't guarantee that predicates with a non-deterministic UDF are evaluated against every row anyway. The solution we went before was to change the behaviour (since it wasn't contractual) but provide the enable_expr_rewrites safety valve in case someone was relying on the behaviour. I believe we have a JIRA somewhere to add metadata about whether a UDF is non-determistic. -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 18: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4946: fix hang in BufferPool
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6224 Change subject: IMPALA-4946: fix hang in BufferPool .. IMPALA-4946: fix hang in BufferPool Once the write is removed from the "in flight" list, both the Client and Page may be destroyed by a different thread. The fix is to signal condition variables before inside the critical section that removes the write from the in flight list. Also fix a potential pitfall with DiskIoMgr::CancelContext() where concurrent calls to the method, which can be called asynchronously with other methods, could result in a hang in DiskIoMgr::CancelContext(). I do not believe any Impala code calls it concurrently from multiple threads, so the bug was only latent. Testing: I was able to reproduce reliably by inserting a 1s sleep before the NotifyAll() calls. After the fix, the hang didn't reproduce with sleeps inside or outside the critical section. I could not come up with a unit test that had a higher reproduction rate than the current tests - the window for the race is very small. I considered adding a debug stress option to insert these delays, but with all the code moved into the critical section it wouldn't be useful. Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 --- M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/disk-io-mgr-internal.h 2 files changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/6224/1 -- To view, visit http://gerrit.cloudera.org:8080/6224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. IMPALA-4593,IMPALA-4635: fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593. Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf --- M infra/python/bootstrap_virtualenv.py A infra/python/deps/compiled-requirements.txt M infra/python/deps/download_requirements A infra/python/deps/kudu-requirements.txt M infra/python/deps/pip_download.py M infra/python/deps/requirements.txt 6 files changed, 120 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6218/2 -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4946: fix hang in BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4946: fix hang in BufferPool .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6224/1//COMMIT_MSG Commit Message: PS1, Line 11: before remove -- To view, visit http://gerrit.cloudera.org:8080/6224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5008: Fix reading stats for TINYINT and SMALLINT
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5008: Fix reading stats for TINYINT and SMALLINT .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6226/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 34: if (col_stats < std::numeric_limits::min() || col_stats isn't valid if ret is false, right? Line 39: DCHECK(false); I don't think we should DCHECK here - users do sometimes run DEBUG builds on their clusters (often their dev environment). Line 50: if (col_stats < std::numeric_limits::min() || See above comments. -- To view, visit http://gerrit.cloudera.org:8080/6226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b10508db53747e7b08c8bd9a69c763b82135a78 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[native-toolchain-CR] Add a script to build Kudu using existing toolchain artifacts
Tim Armstrong has posted comments on this change. Change subject: Add a script to build Kudu using existing toolchain artifacts .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6167 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I237580e1545033467a92285ca8bb8db1cf8c804e Gerrit-PatchSet: 3 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5008: Fix reading stats for TINYINT and SMALLINT
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5008: Fix reading stats for TINYINT and SMALLINT .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6226/3/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 39: return ret; return true? Line 51: return ret; return true? -- To view, visit http://gerrit.cloudera.org:8080/6226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b10508db53747e7b08c8bd9a69c763b82135a78 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-5025: upgrade to binutils 2.26.1
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6247 Change subject: IMPALA-5025: upgrade to binutils 2.26.1 .. IMPALA-5025: upgrade to binutils 2.26.1 This has two fixes that we care about: * The slow linking problem that the -p1 patch to 2.26 solved * A way to disable emitting relocations R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX that aren't supported by older linkers. We configure binutils to disable these by default to be as safe as possible. Testing: Built Impala locally with 2.26-p1 and 2.26.1 and inspected libImpalaUdf.a with objdump -x to make sure that the relocation types changed to ensure that the change had an effect. Change-Id: I0de3b30f6ade4a3fab77572caac7be3a8d018105 --- M init.sh M source/binutils/build.sh 2 files changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/47/6247/1 -- To view, visit http://gerrit.cloudera.org:8080/6247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0de3b30f6ade4a3fab77572caac7be3a8d018105 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5027: make udf headers buildable externally
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6251 Change subject: IMPALA-5027: make udf headers buildable externally .. IMPALA-5027: make udf headers buildable externally This fixes a number of issues that prevented building it: * Use of C++11 features * Use of internal headers that are only available in the Impala source tree Testing: Copied the headers and libImpalaUdf.a to /usr and confirmed I could build impala-udf-samples (with some modifications to udf-test-harness to use some modified arguments to the test harness). We really need a better solution for automatically testing this, e.g. IMPALA-5026, but I wanted to fix the known bugs first. Change-Id: Ibdcdcd96c953cbabc3de04ca66d7aa1774ada99e --- M be/src/udf/uda-test-harness-impl.h M be/src/udf/uda-test-harness.h M be/src/udf/udf-debug.h M be/src/udf/udf-test-harness.h M be/src/udf/udf.h 5 files changed, 45 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/6251/1 -- To view, visit http://gerrit.cloudera.org:8080/6251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibdcdcd96c953cbabc3de04ca66d7aa1774ada99e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6252 Change subject: IMPALA-5025: Update binutils to 2.26.1 .. IMPALA-5025: Update binutils to 2.26.1 This release includes the fix for IMPALA-3507 and the fix for https://sourceware.org/bugzilla/show_bug.cgi?id=19520, which made libImpalaUdf.a unusable by older linkers. Change-Id: Ia7e82bfca288c8c5f20cdfd560680801fa43b90a --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/6252/1 -- To view, visit http://gerrit.cloudera.org:8080/6252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7e82bfca288c8c5f20cdfd560680801fa43b90a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1
Tim Armstrong has abandoned this change. Change subject: IMPALA-5025: Update binutils to 2.26.1 .. Abandoned Didn't mean to push - need to wait until we push out a new toolchain version -- To view, visit http://gerrit.cloudera.org:8080/6252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ia7e82bfca288c8c5f20cdfd560680801fa43b90a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[native-toolchain-CR] Fix setup / download order in build scripts
Tim Armstrong has posted comments on this change. Change subject: Fix setup / download order in build scripts .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6249/3/source/avro/build.sh File source/avro/build.sh: Line 26: download_dependency $PACKAGE "avro-src-${PACKAGE_VERSION}.tar.gz" $THIS_DIR Maybe move all of these download_dependency calls into if needs_build_package for consistency. It's done in some places but not others. -- To view, visit http://gerrit.cloudera.org:8080/6249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I287f091c8e772c08c8a61644d08117145b1ef16d Gerrit-PatchSet: 3 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5027: make udf headers buildable externally
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5027: make udf headers buildable externally .. Patch Set 1: A lot of widely-used Linux distributions have default compilers that don't support C++11 or C++14. E.g. RHEL 6 doesn't support C++11, RHEL 7 ships gcc 4.8.x, which doesn't support C++14 https://access.redhat.com/solutions/19458 -- To view, visit http://gerrit.cloudera.org:8080/6251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdcdcd96c953cbabc3de04ca66d7aa1774ada99e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 4: Can you make the design doc public and link to it? It's shared with me but other reviewers may want to look at it. -- To view, visit http://gerrit.cloudera.org:8080/5914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 4: (18 comments) http://gerrit.cloudera.org:8080/#/c/5914/4/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 257: // non-deterministic expression. I don't think we should trust users not to set this to false on non-deterministic expressions - they shouldn't be able to potentially crash an Impala daemon with a bad query option value. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 60: private final static List NONDETERMINISTIC_BUILTINS = https://gerrit.cloudera.org/#/c/5904/ also factored out this list, so when you rebase on top of master we should be able to reuse that. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 39: private static final int SORT_MATERIALIZATION_COST_THRESHOLD = 5; Comment on how it's derived? If it's somewhat arbitrary its good for readers to know. Line 225: analyzer.addWarning("Expression '" + orderingExpr.toSql() + " may be " + I mentioned in an earlier comment - this seems to be unsafe - we shouldn't trust the query option too much. Line 246: // LiteralExprs don't affect the sort order. TODO: consider removing the sort node Do we have a test that covers the case when all ordering exprs are eliminated? Line 263: orderingExprsIter.set(materializedRef); Is this needed? Is this handled by putting it in the substitution map? http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: Line 152: if (mergeInfo_.getIsMaterialized().get(i)) { Also mentioned on the design doc - I think we should only show this at a higher explain levels. Any maybe spell out "MATERIALIZED" so that it's less cryptic. http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: Line 187: if (info_.getIsMaterialized().get(i)) { See previous comment http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 785: | order by: id ASC MAT I guess it's useful for these planner tests to show it, so maybe it is good to have it at this explain level. Would be interested what others think. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test: Line 46: | order by: int_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST It's weird that there's no "MAT" here but there are some in the sort below - is that a bug? It seems like we should be showing all plain slotrefs as materialised. Line 98: | order by: int_col ASC Same here Line 454: | order by: tinyint_col + 1 ASC NULLS FIRST, double_col / 2 ASC NULLS FIRST, 4 - int_col ASC, 4 * smallint_col ASC Looks like the cost-based analysis is working here Line 789: | order by: double_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST, int_col ASC It seems like we're only materialising things in the bottom-most sort for some reason. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/order.test File testdata/workloads/functional-planner/queries/PlannerTest/order.test: Line 1245: It would be good to also add some basic planner tests for the materialize_sort query option - just as as sanity check that it actually has the intended effect. http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test: Line 1921: # Tests 'order by' on a non-deterministic function, return order isn't guaranteed. It would also be good to test the materialize_sort option for analytics http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/sort.test File testdata/workloads/functional-query/queries/QueryTest/sort.test: Line 4146: select id from alltypestiny order by random() It would be good to add a couple of tests for the materialize_sort query option to make sure that it produces the correct results. Line 4150: select id from alltypestiny order by "AAA" I think this answers my earlier question about test coverage. http://gerrit.cloudera.org:8080/#/c/5914/4/tests/query_test/test_sort.py File
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/6218/2/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: Line 24: # instead. > It'd be helpful to have the nice description from the commit message here t Done PS2, Line 90: exec_cmd > does this really throw if the command doesn't exist? I couldn't find docs f It's a wrapper around popen defined on line 72 PS2, Line 104: "ccache " + cc > better to use format() Done Line 115: if cc is not None: env["CC"] = cc > Else bail with a proper error message? What's the consequence of possibly u We won't have toolchain gcc during the first phase of bootstrapping so we need to fail gracefully in that case. I changed this so that it sets CC to a bogus value if toolchain gcc is not available. This revealed a bunch of other packages that used a C compiler. Line 189: LOG.debug("Skipping compiled deps: cc not available") > What's the use case for continuing here as opposed to throwing an error. This code can execute before we've bootstrapped the toolchain, so it needs to fail gracefully. I added comments to explain the bootstrap process better. Line 326: if install_compiled_deps_if_possible(): > Are these only needed for Kudu? If so consider only doing this when KUDU_IS It turns out that a bunch of other packages have some compiled files. I don't see much advantage to skipping this in that special case and it seems simpler to explain a general compiled dependency phase instead of splitting out a compiled-kudu-dependencies phase. -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. IMPALA-4593,IMPALA-4635: fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593. Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf --- M infra/python/bootstrap_virtualenv.py A infra/python/deps/compiled-requirements.txt M infra/python/deps/download_requirements A infra/python/deps/kudu-requirements.txt M infra/python/deps/pip_download.py M infra/python/deps/requirements.txt 6 files changed, 150 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6218/3 -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Hello Matthew Jacobs, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6218 to look at the new patch set (#4). Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. IMPALA-4593,IMPALA-4635: fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593. Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf --- M infra/python/bootstrap_virtualenv.py A infra/python/deps/compiled-requirements.txt M infra/python/deps/download_requirements A infra/python/deps/kudu-requirements.txt M infra/python/deps/pip_download.py M infra/python/deps/requirements.txt 6 files changed, 161 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6218/4 -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6218/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS3, Line 28: Every time this script is run > nit: this makes it sound like this might be run multiple times It is run multiple times - every time impala-python is run. http://gerrit.cloudera.org:8080/#/c/6218/4/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: Line 113: if use_ccache(): cc = "ccache {}".format(cc) This format syntax isn't supported in older Python versions (this script needs to support them). Line 206: '''Installs the Kudu python module if possible, which depends on the toolchain and I ran into this issues when using toolchain gcc on CentOS5. Tested that this workaround works. -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 4: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6218/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS3, Line 28: Every time this script is run > I meant it sounds like it might be needed to run multiple times to complete Yes, that's true. -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6218/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS3, Line 28: Every time this script is run > Yes, that's true. (we need the virtualenv to bootstrap the toolchain, so we don't have the toolchain on the first run on a clean system) -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Hello Impala Public Jenkins, Matthew Jacobs, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6218 to look at the new patch set (#5). Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. IMPALA-4593,IMPALA-4635: fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593. Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf --- M infra/python/README M infra/python/bootstrap_virtualenv.py A infra/python/deps/compiled-requirements.txt M infra/python/deps/download_requirements A infra/python/deps/kudu-requirements.txt M infra/python/deps/pip_download.py M infra/python/deps/requirements.txt 7 files changed, 197 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6218/5 -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4593,IMPALA-4635: fix some python build issues
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4593,IMPALA-4635: fix some python build issues .. Patch Set 5: Code-Review+2 Add missing Apache headers -- To view, visit http://gerrit.cloudera.org:8080/6218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. * Buffer management of read/write streams uses the new pin-counting support to separate pinning of the read and write pages. This means that the reservation usage of an unpinned read/write stream does not fluctuate and the read/write iterators can always advance without requiring additional reservation. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 2,935 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/7 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has uploaded a new patch set (#8). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. * Buffer management of read/write streams uses the new pin-counting support to separate pinning of the read and write pages. This means that the reservation usage of an unpinned read/write stream does not fluctuate and the read/write iterators can always advance without requiring additional reservation. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 2,931 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/8 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/5811/6/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS6, Line 112: DCHECK_LE(read_ptr_, read_end_ptr); > would be nice to write this and line 106 the same to make it a little faste Done Line 252: if (!read_page_->is_pinned()) { > why does this not have to check if read and write page are the same? I ended up reworking the buffer management as discussed to take advantage of the new pin-counting support. There are still some tricky corner cases but I think the invariants are clearer now, and we can provide stronger guarantees about being able to iterate through unpinned streams. Line 276: if (!pinned_ && write_page_ != &pages_.front()) { > this could use a comment: Done Line 332: && (&page == write_page_ || (read_write_ && &page == &*read_page_))) { > why is this right to unpin the read_page_ if !read_write_? This got reworked with the other changes http://gerrit.cloudera.org:8080/#/c/5811/6/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS6, Line 517: CalcNumPinned > this looks dead Done -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes