Re: [VOTE] Graduate to a TLP
+1 On Tue, Oct 17, 2017 at 7:25 PM, Thomas Tauber-Marshall < tmarsh...@cloudera.com> wrote: > +1 > > On Tue, Oct 17, 2017 at 9:12 PM Bharath Vissapragada < > bhara...@cloudera.com> > wrote: > > > +1 > > > > On Tue, Oct 17, 2017 at 7:10 PM, Mostafa Mokhtar> > wrote: > > > > > +1 > > > > > > Thanks > > > Mostafa > > > > > > > On Oct 17, 2017, at 7:09 PM, Brock Noland wrote: > > > > > > > > +1 > > > > > > > >> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker > wrote: > > > >> +1 > > > >> > > > >>> On Oct 17, 2017 19:07, "Jim Apple" wrote: > > > >>> > > > >>> Following our discussion > > > >>> https://lists.apache.org/thread.html/ > 2f5db4788aff9b0557354b9106c032 > > > >>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E > > > >>> , I propose that we graduate to a TLP. According to > > > >>> https://incubator.apache.org/guides/graduation.html# > > > >>> community_graduation_vote > > > >>> this is not required, and https://impala.apache.org/bylaws.html > does > > > not > > > >>> say whose votes are "binding" in a graduation vote, so all > community > > > >>> members are welcome to vote. > > > >>> > > > >>> This will remain open 72 hours. I will be notifying > general@incubator > > > it > > > >>> is > > > >>> occurring. > > > >>> > > > >>> This is my +1. > > > >>> > > > > > > -- Thanks, Michael
Re: jenkins.impala.io maintenance
Jenkins plugins have been updated. Aborted GVD jobs were restarted. On Mon, Aug 7, 2017 at 5:25 PM, Michael Ho <k...@cloudera.com> wrote: > jenkins.impala.io need updates for some plugins to address a new security > advisory. > > It will be put into maintenance mode at 5:35pm PST so no new jobs can be > submitted. The upgrade will happen after all pending jobs complete. Once > the upgrade completes, another email will be sent out. > > Please speak up if you have any objection to the above. > > -- > Thanks, > Michael > -- Thanks, Michael
Re: incubator-impala github repo is gone ?
https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git is still around. On Fri, Apr 28, 2017 at 11:16 AM, Michael Ho <k...@cloudera.com> wrote: > https://github.com/apache/incubator-impala seems to be empty now. > > Just filed https://issues.apache.org/jira/browse/INFRA-14038 > > -- > Thanks, > Michael > -- Thanks, Michael
incubator-impala github repo is gone ?
https://github.com/apache/incubator-impala seems to be empty now. Just filed https://issues.apache.org/jira/browse/INFRA-14038 -- Thanks, Michael
Re: GVO request
Done. On Mon, Mar 13, 2017 at 2:35 PM, Zachary Amsdenwrote: > Looking to start a verification run for > https://gerrit.cloudera.org/#/c/6275/ > > Thanks in advance, > > - Zach > -- Thanks, Michael
Re: gerrit email notifications
I didn't get email notification for https://gerrit.cloudera.org/#/c/5950/ either. On Wed, Feb 8, 2017 at 10:40 PM, Bharath Vissapragadawrote: > https://gerrit.cloudera.org/#/c/5792/ > > For my last two patch sets, I didn't see an email. > > On Wed, Feb 8, 2017 at 10:38 PM, Alex Behm wrote: > > > Do you have a recent example? > > > > On Wed, Feb 8, 2017 at 10:33 PM, Bharath Vissapragada < > > bhara...@cloudera.com > > > wrote: > > > > > Is anyone else not receiving email notifications for new patch sets or > is > > > it just me? > > > > > > -- Thanks, Michael
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 2: Patch #2 reverts to using the boost library condition variable for now. Given the wide spread usage of it in the existing code base, it seems a bit too distracting to replace all of them in this patch. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. IMPALA-4026: Implement double-buffering for BlockingQueue. With recent changes to improve the parquet scanner's efficency, row batches are produced more quickly, leading to higher contention in the blocking queue shared between scanner threads and the scan node. The contention happens between different producers (i.e. the scanner threads) and also to a lesser extent, between the scanner threads and the scan node. This change addresses the contention between the scanner threads and the scan node by splitting the queue into a 'read_list_' and a 'write_list_'. The consumers will consume from 'read_list_' until it's exhausted while the producers will enqueue into 'write_list_' until it's full. When 'read_list_' is exhausted, the consumer will atomically swap the 'read_list_' with 'write_list_'. This reduces the contention/overhead in two ways: (1) 'read_list_' and 'write_list_' are protected by two different locks so consumer only contends for the write lock when 'read_list_' is empty. (2) the consumer only signals the producer after an entire 'read_list_' is consumed instead of signalling it per entry in the 'read_list_'. With this change, the regression in primitive_filter_bigint_non_selective went from 1.6s to 0.8s, improving by 50%. Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 --- M be/src/util/blocking-queue.h 1 file changed, 89 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/4350/2 -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 1: (3 comments) It appears to me that primitive_conjunct_odering_5 and primitive_conjunct_ordering_1 have quite a bit of variance. I compared the result at the following two baseline runs: http://sandbox.jenkins.cloudera.com/job/impala-private-perf-16node/234/ at commit 1a83f8bbe871df0527923e6f131a295270e54d33 http://sandbox.jenkins.cloudera.com/job/impala-private-perf-16node/233/ at commit 4fd25114d2ca23205396af1c16dcde3d3bec69fb The result for conjunt_ordering_5 ranges from 27.16 to 44.45 while the reference baseline is 38.93. Similarly, conjunct_ordering_1 ranges from 10.95 to 11.08 while the reference baseline is 9.84. My baseline should be based on run 233: +-++---++-++---++-+---+ | Workload| Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-++---++-++---++-+---+ | TARGETED-PERF(_300) | primitive_filter_string_selective | parquet / none / none | 1.09 | 0.99| +10.31% | 2.59% | 2.49%| 1 | 3 | | TARGETED-PERF(_300) | primitive_conjunct_ordering_2 | parquet / none / none | 73.94 | 70.22 | +5.29% | 3.50% | 0.29%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_2 | parquet / none / none | 6.01 | 5.73| +5.04% | 0.07% | 0.44%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_1 | parquet / none / none | 1.70 | 1.62| +4.83% | 1.26% | 3.04%| 1 | 3 | | TARGETED-PERF(_300) | primitive_exchange_broadcast | parquet / none / none | 84.30 | 81.13 | +3.90% | 0.67% | 0.35%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_string_non_selective | parquet / none / none | 1.04 | 1.01| +2.31% | 2.44% | 0.21%| 1 | 3 | | TARGETED-PERF(_300) | primitive_groupby_bigint_lowndv | parquet / none / none | 3.62 | 3.57| +1.44% | 2.10% | 0.71%| 1 | 3 | | TARGETED-PERF(_300) | primitive_groupby_bigint_pk | parquet / none / none | 89.36 | 88.84 | +0.59% | 2.64% | 1.25%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_3 | parquet / none / none | 58.96 | 58.61 | +0.59% | 0.43% | 0.08%| 1 | 3 | | TARGETED-PERF(_300) | primitive_orderby_bigint | parquet / none / none | 30.72 | 30.64 | +0.26% | 0.00% | 0.00%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_decimal_selective | parquet / none / none | 0.92 | 0.92| +0.14% | 0.06% | 0.53%| 1 | 3 | | TARGETED-PERF(_300) | primitive_groupby_bigint_highndv | parquet / none / none | 23.84 | 23.82 | +0.08% | 0.67% | 0.31%| 1 | 3 | | TARGETED-PERF(_300) | primitive_empty_build_join_1 | parquet / none / none | 13.31 | 13.32 | -0.10% | 1.04% | 0.54%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_string_like | parquet / none / none | 5.77 | 5.84| -1.28% | 3.07% | 2.55%| 1 | 3 | | TARGETED-PERF(_300) | primitive_conjunct_ordering_4 | parquet / none / none | 1.17 | 1.18| -1.33% | 1.30% | 0.01%| 1 | 3 | | TARGETED-PERF(_300) | primitive_shuffle_join_one_to_many_string_with_groupby | parquet / none / none | 228.99 | 232.12 | -1.35% | 0.29% | 1.01%| 1 | 3 | | TARGETED-PERF(_300) | primitive_conjunct_ordering_3 | parquet / none / none | 1.53 | 1.55| -1.87% | 0.37% | 4.97%| 1 | 3 | | TARGETED-PERF(_300) | primitive_orderby_all | parquet / none / none | 108.38 | 110.56 | -1.97% | 0.73% | 1.21%| 1 | 3 | | TARGETED-PERF(_300
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG Commit Message: Line 29: BlockingQueue to using POSIX pthread primitives instead of boost > I meant to say boost::condition_variable::wait() in the previous comment, n Btw, this overhead seems to be related to the thread interruption support in boost library (https://www.justsoftwaresolutions.co.uk/threading/thread-interruption-in-boost-thread-library.html). Given the way we shut down the blocking queue, I am not sure if we need this. May be we can compile it out of the boost library. -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG Commit Message: Line 29: BlockingQueue to using POSIX pthread primitives instead of boost > In one of the backtrace from VTune, there was about 25% of the time of pthr I meant to say boost::condition_variable::wait() in the previous comment, not pthread_cond_wait(). -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has posted comments on this change. Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG Commit Message: Line 29: BlockingQueue to using POSIX pthread primitives instead of boost > This seems strange. I looked at ./boost_1_57_0/boost/thread/pthread/mutex.h In one of the backtrace from VTune, there was about 25% of the time of pthread_cond_wait() spent in the destructor of interruption_checker(). Similarly, there is an extra mutex_lock() in notify_one() and notify_all(). It's unclear what these extra checks in boost library are buying us. Line 45: | TARGETED-PERF(_300) | primitive_conjunct_ordering_1 | parquet / none / none | 10.72 | 9.84| +8.95% | 2.57% | 0.53%| 1 | 3 | > Any idea why these regressed? Might be good to understand if there's someth Not sure if this is related to this patch. The nightly run from last night also shows similar (and in fact larger) level of regression: http://sandbox.jenkins.cloudera.com/job/impala-workload-runner-16node-cdh5/1763/ -- To view, visit http://gerrit.cloudera.org:8080/4350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1616: Improve the Memory Limit Exceeded error report
Michael Ho has posted comments on this change. Change subject: IMPALA-1616: Improve the Memory Limit Exceeded error report .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4335/3/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: PS3, Line 250: Fragment fragment (no cap). Also space after ":" as Tim suggested. -- To view, visit http://gerrit.cloudera.org:8080/4335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibb4e0c359d889938b4c351771ba539850bdb95ea Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Michael Ho has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS2, Line 317: uint8_t* > The return value has to be const if the input argument is const though. I t I see. Makes sense. -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4350 Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue. .. IMPALA-4026: Implement double-buffering for BlockingQueue. With recent changes to improve the parquet scanner's efficency, row batches are produced more quickly, leading to higher contention in the blocking queue shared between scanner threads and the scan node. The contention happens between different producers (i.e. the scanner threads) and also to a lesser extent, between scanner threads and scan node. This change addresses the contention between scanner threads and scan node by splitting the queue into a 'read_list_' and a 'write_list_'. The consumers will consume from 'read_list_' until it's exhausted while the producers will enqueue into 'write_list_' until it's full. When 'read_list_' is exhausted, the consumer will atomically swap the 'read_list_' with 'write_list_'. This reduces the contention/overhead in two ways: (1) 'read_list_' and 'write_list_' are protected by two different locks so consumer only contends for the write lock when 'read_list_' is empty. (2) the consumer only signals the producer after an entire 'read_list_' is consumed instead of signalling it per entry in the 'read_list_'. This change also converts BlockingQueue to using POSIX pthread primitives instead of boost library which introduces some unncessary overhead (as observed from VTune profiles). With this change, the regression in primitive_filter_bigint_non_selective went from 1.6s to 0.8s, improving by 50%. +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 34.74 | -4.56% | 9.75 | -4.50% | +-+---+-++++ +-++---++-++---++-+---+ | Workload| Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-++---++-++---++-+---+ | TARGETED-PERF(_300) | primitive_conjunct_ordering_1 | parquet / none / none | 10.72 | 9.84| +8.95% | 2.57% | 0.53%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_string_selective | parquet / none / none | 1.09 | 1.01| +7.42% | 2.59% | 4.91%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_2 | parquet / none / none | 6.01 | 5.71| +5.38% | 0.07% | 0.96%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_string_non_selective | parquet / none / none | 1.04 | 0.99| +4.89% | 2.44% | 2.46%| 1 | 3 | | TARGETED-PERF(_300) | primitive_conjunct_ordering_5 | parquet / none / none | 40.60 | 38.93 | +4.29% | 3.50% | 1.02%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_decimal_selective | parquet / none / none | 0.92 | 0.88| +3.70% | 0.06% | 2.69%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_1 | parquet / none / none | 1.70 | 1.66| +2.74% | 1.26% | 1.43%| 1 | 3 | | TARGETED-PERF(_300) | primitive_empty_build_join_1 | parquet / none / none | 13.31 | 13.07 | +1.79% | 1.04% | 0.98%| 1 | 3 | | TARGETED-PERF(_300) | primitive_groupby_bigint_lowndv | parquet / none / none | 3.62 | 3.59| +0.73% | 2.10% | 0.04%| 1 | 3 | | TARGETED-PERF(_300) | primitive_filter_string_like | parquet / none / none | 5.77 | 5.74| +0.45% | 3.07% | 1.76%| 1 | 3 | | TARGETED-PERF(_300) | primitive_broadcast_join_3 | parquet / none / none | 58.96 | 58.73 | +0.39% | 0.43% | 0.27%| 1 | 3 | | TARGETED-PERF(_300) | primitive_groupby_bigint_highndv | parquet / none / none | 23.84 | 23.78
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Michael Ho has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 185: uint32_t HashTableCtx::HashVariableLenRow( > The whitespace decisions were made by clang-format. It does seem a little d Can we configure clang-format to not format the code this way ? I find such inconsistency very annoying but that's just me. PS2, Line 317: uint8_t* > Some of the callsites need it to return a non-const pointer, e.g. EvalRow() Hmm...I think I meant the first argument, not the return value. -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers
Michael Ho has posted comments on this change. Change subject: IMPALA-4008: don't bake in hash table and hash join pointers .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 185: uint32_t HashTableCtx::HashVariableLenRow( nit: this new line seems to make the formatting a bit inconsistent with say line 163. Same for other places in this file. PS2, Line 317: uint8_t* const uint8_t* PS2, Line 322: ExprValueNullPtr( This function seems to have only one caller (ExprValueNull()) left. I find the asymmetry between ExprValuePtr() and ExprValueNullPtr() misleading. How about inlining the logic directly into the only caller ? Line 967: return Status( Wondering why the extra new line here ? http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 158: return expr_values_cache_.ExprValuePtr(expr_values_cache_.cur_expr_values(), expr_idx); nit: long line PS2, Line 406: i missing space ? PS2, Line 407: extra space Line 419: bool IR_NO_INLINE EvalProbeRow( nit: this extra new line seems a bit inconsistent with the rest of the files (e.g. line 432). PS2, Line 425: rows a row ? Line 429: /// 'expr_values_null' Extra new line again. Why not continue the next line here ? PS2, Line 436: nit: extra space PS2, Line 446: IR_ALWAYS_INLINE ALWAYS_INLINE ? -- To view, visit http://gerrit.cloudera.org:8080/4326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3815: clean up cross-compiled comparator
Michael Ho has posted comments on this change. Change subject: IMPALA-3815: clean up cross-compiled comparator .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I058917da2c13ba41d6ff7fefbb761606344312ab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
Michael Ho has posted comments on this change. Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement .. Patch Set 2: (16 comments) The change looks good. Mostly comments are about clarification in functions' comments. http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/codegen/constant-replacement.h File be/src/codegen/constant-replacement.h: PS2, Line 47: we often want to ignore non-constant arguments. I.e. NO_ARG Can this be shortened to: Note that NO_ARG doesn't imply ... PS2, Line 75: replace be replaced PS2, Line 88: 10 a LLVM ConstantInt 10 ? Line 93: /// will then be called with name = "some_function" and arg = 24. If the argument is not It may help to point out that ReplaceOneArg() is usually used if replacement constant value is derived from the input constant arg. PS2, Line 104: constant value LLVM constant PS2, Line 106: ReplaceNoArgs Should ReplaceNoArgs() and ReplaceOneArgs() be protected ? PS2, Line 112: = NO_ARG != NO_ARG ? PS2, Line 113: value integral value PS2, Line 114: constant value LLVM constant. PS2, Line 116: Only integral constant args for now. Currently only support integral constant args. Line 150: std::vector result; Do we expect this to be usually called once ? It appears that we could have cached this in a set when AddReplacement() is called instead of computing it every time ? http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS2, Line 798: fn_map.find(callee->getName()) fn_map.find(callee->getName()) may be evaluated once in this statement: if (callee != NULL) { unordered_map<string, ReplaceableFunction>::const_iterator fn_map_entry = fn_map.find(callee->getName()); if (fn_map_entry == fn_map.end()) continue; ... } PS2, Line 1014: !fn.isDeclaration() !fn.isDeclaration() && !fn.isIntrinsic() ? http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: PS2, Line 644: int return_precision = \ : context->impl()->GetReturnPrecision(); \ one line ? http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/exprs/expr-constant-replacement.cc File be/src/exprs/expr-constant-replacement.cc: Line 41: const vector& arg_types) : nit: indentation http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/runtime/types.h File be/src/runtime/types.h: PS2, Line 238: inline int IR_ALWAYS_INLINE ALWAYS_INLINE int -- To view, visit http://gerrit.cloudera.org:8080/3843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4051: Avoid unnecessary copy of RowDescriptor into RowBatch
Michael Ho has posted comments on this change. Change subject: IMPALA-4051: Avoid unnecessary copy of RowDescriptor into RowBatch .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4181/2/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: PS2, Line 384: const RowDescriptor& row_desc_; Is this always safe ? -- To view, visit http://gerrit.cloudera.org:8080/4181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27c05090ee8b9dd9d650a5587a3ae42839aa0008 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has posted comments on this change. Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4174/4/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: Line 513: /// the query option 'NUM_SCANNER_THREADS' is not set. > explain that it is NUM_SCANNER_THREADS if that option is set, and explain w Done -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has posted comments on this change. Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. Patch Set 5: Code-Review+2 Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4174 to look at the new patch set (#5). Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. IMPALA-2831: Bound the number of scanner threads per scan node. Our current code base allows a scan node to spin up as many as 3x the number of logical cpu cores of scanner threads. However, the scanner threads are cpu bound so there is diminishing return for starting more scanner threads than the number of logical cores. In fact, it may be detrimental due to context switching overhead. This change bounds the number of scanner threads spun up by a scan node to the number of logical cpu cores unless the query option 'num_scanner_threads' is set. The total number of available thread tokens is unchanged. With this change, the peak memory usage of the following query on a single node Impala cluster running on a machine with 8 logical cores reduces from 287MB to 101MB. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20 The reduction comes mostly from the fewer outstanding IO buffers. The IO for scan ranges will be scheduled by the scanner threads which pick them up. There will be at least an IO buffer of 8 to 16MB associated with each scan range. So, more threads we start up, more memory will be consumed by the IO buffers, leading to the higher peak memory usages. Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f --- M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h 2 files changed, 23 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/5 -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has posted comments on this change. Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4174/1//COMMIT_MSG Commit Message: Line 20: with 8 logical cores reduces from 287MB to 101MB. > could you add a little more explanation about why the mem usage goes down ( Done http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 906: if (!ranges.empty()) ThreadTokenAvailableCb(runtime_state_->resource_pool()); > and isn't this already handled by: As discussed offline, 'progress_' doesn't take into account of file format so it will be non-zero. This ends up causing a bunch of scanner threads to be started. In the worst case, those threads may just quit when they realize they cannot get a scan range. I find this behavior a bit confusing. -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has uploaded a new patch set (#4). Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. IMPALA-2831: Bound the number of scanner threads per scan node. Our current code base allows a scan node to spin up as many as 3x the number of logical cpu cores of scanner threads. However, the scanner threads are cpu bound so there is diminishing return for starting more scanner threads than the number of logical cores. In fact, it may be detrimental due to context switching overhead. This change bounds the number of scanner threads spun up by a scan node to the number of logical cpu cores unless the query option 'num_scanner_threads' is set. The total number of available thread tokens is unchanged. With this change, the peak memory usage of the following query on a single node Impala cluster running on a machine with 8 logical cores reduces from 287MB to 101MB. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20 The reduction comes mostly from the fewer outstanding IO buffers. The IO for scan ranges will be scheduled by the scanner threads which pick them up. There will be at least an IO buffer of 8 to 16MB associated with each scan range. So, more threads we start up, more memory will be consumed by the IO buffers, leading to the higher peak memory usages. Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f --- M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h 2 files changed, 21 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/4 -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. IMPALA-2831: Bound the number of scanner threads per scan node. Our current code base allows a scan node to spin up as many as 3x the number of logical cpu cores of scanner threads. However, the scanner threads are cpu bound so there is diminishing return for starting more scanner threads than the number of logical cores. In fact, it may be detrimental due to context switching overhead. This change bounds the number of scanner threads spun up by a scan node to the number of logical cpu cores unless the query option 'num_scanner_threads' is set. The total number of available thread tokens is unchanged. With this change, the peak memory usage of the following query on a single node Impala cluster running on a machine with 8 logical cores reduces from 287MB to 101MB. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20 The reduction comes mostly from the fewer outstanding IO buffers. The IO for scan ranges will be scheduled by the scanner threads which pick them up. There will be at least an IO buffer of 8 to 16MB associated with each scan range. So, more threads we start up, more memory will be consumed by the IO buffers, leading to the higher peak memory usages. Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f --- M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h 2 files changed, 23 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/3 -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has posted comments on this change. Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS1, Line 738: The scanner thread will yield after completing a scan range so the main thread : // also gets to run. > I don't understand what this is trying to say, given that you also have the What I was trying to convey was that the number of scanner threads being the same as the number of logical cores may starve the main thread. While it's true that OS' preemption will ensure some chances for the main thread to run, this comment is to point out that the scanner thread also active yields the cpu so other threads get to run too. Let me rephrase it (or remove it). Line 740: runtime_state_->resource_pool()->set_max_quota(CpuInfo::num_cores() + 1); > hmm doesn't this change the number of threads available for the join side h Yes, good point. So, the way we implemented 'NUM_SCANNER_THREADS' query option is broken all along. Let's not touch the max_quota at all for this query option and check it in another way. Please wait for the next update of this patch. -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. IMPALA-2831: Bound the number of scanner threads per scan node. Our current code base allows a scan node to spin up as many as 3x the number of logical cpu cores of scanner threads. However, the scanner threads are cpu bound so there is diminishing return for starting more scanner threads than the number of logical cores. In fact, it may be detrimental due to context switching overhead. This change bounds the number of scanner threads spun up by a scan node to the number of logical cpu cores unless the query option 'num_scanner_threads' is set. The total number of available thread tokens is unchanged. With this change, the peak memory usage of the following query on a single node impala cluster running on a machine with 8 logical cores reduces from 287MB to 101MB. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20 Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f --- M be/src/exec/hdfs-scan-node.cc 1 file changed, 10 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/2 -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has posted comments on this change. Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 262: RETURN_IF_ERROR(HdfsParquetScanner::IssueInitialRanges(this, > Is the order important? Would be helpful to have a comment here if there is Yes, mostly for slight preference for parquet files. This mostly makes up for the difference in behavior due to the change in line 906. Previously, we would have spun up those scanner threads (unintentionally) when calling IssueInitalRanges() for Text file format. We also start issuing the initial IO ranges slightly earlier now for parquet files. Line 740: runtime_state_->resource_pool()->set_max_quota(CpuInfo::num_cores() + 1); > Nm, you are already doing that. It seems weird that the scanner threads quo Yes, I agree that we probably should have the + 1 for the query option too. -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4174 Change subject: IMPALA-2831: Bound the number of scanner threads per scan node. .. IMPALA-2831: Bound the number of scanner threads per scan node. Our current code base allows a scan node to spin up as many as 3x the number of logical cpu cores of scanner threads. However, the scanner threads are cpu bound so there is diminishing return for starting more scanner threads than the number of logical cores. In fact, it may be detrimental due to context switching overhead. This change bounds the number of scanner threads spun up by a scan node to the number of logical cpu cores unless the query option 'num_scanner_threads' is set. The total number of available thread tokens is unchanged. With this change, the peak memory usage of the following query on a single node impala cluster running on a machine with 8 logical cores reduces from 287MB to 101MB. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20 Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f --- M be/src/exec/hdfs-scan-node.cc 1 file changed, 9 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/1 -- To view, visit http://gerrit.cloudera.org:8080/4174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4078/6/bin/run-all-tests.sh File bin/run-all-tests.sh: PS6, Line 37: "${CLUSTER_DIR}/admin Misplaced quote (similar to the one in clean.sh) ? -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Yonghyun Hwang <yongh...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4078/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 45: $IMPALA_HOME/toolchain > I intentionally didn't touch assignments, because quoting is not strictly n I believe it'd be great to update them all the way. Thanks again for the clean up. -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4078/1/bin/clean.sh File bin/clean.sh: PS1, Line 49: $IMPALA_ALL_LOGS_DIRS > Yes and no. On one hand, this is supposed to contain multiple log, so havin May be I misunderstood your comment but for all practical purposes, we don't usually use multiple different log directories. -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 3: (5 comments) It appears that you are already fixing some of the places which are considered "non-dangerous". Can you please just bite the bullet and fix up the rest pointed out below ? http://gerrit.cloudera.org:8080/#/c/4078/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 45: $IMPALA_HOME/toolchain missing quote ? http://gerrit.cloudera.org:8080/#/c/4078/3/bin/run-all-tests.sh File bin/run-all-tests.sh: PS3, Line 111: ${IMPALA_HOME}/testdata/bin/split-hbase.sh missing quote ? PS3, Line 137: ${IMPALA_HOME}/bin/run-workload.py missing quote ? PS3, Line 173: ${IMPALA_HOME}/bin/start-impala-cluster.py missing quote ? PS3, Line 177: ${IMPALA_HOME}/bin/mvn-quiet.sh missing quote ? -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] CDH-43354: Bump CDH components' versions.
Michael Ho has posted comments on this change. Change subject: CDH-43354: Bump CDH components' versions. .. Patch Set 1: A core run passed: http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/4002/ -- To view, visit http://gerrit.cloudera.org:8080/4084 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I217f2467628d0213afee1b8a531e48d035237212 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] CDH-43354: Bump CDH components' versions.
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4084 Change subject: CDH-43354: Bump CDH components' versions. .. CDH-43354: Bump CDH components' versions. Change-Id: I217f2467628d0213afee1b8a531e48d035237212 --- M bin/impala-config.sh 1 file changed, 6 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/4084/1 -- To view, visit http://gerrit.cloudera.org:8080/4084 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I217f2467628d0213afee1b8a531e48d035237212 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-4006 impala-config.sh contains dangerous rm -rf statements
Michael Ho has posted comments on this change. Change subject: IMPALA-4006 impala-config.sh contains dangerous rm -rf statements .. Patch Set 1: (12 comments) Thanks for cleaning things up. Not sure if there is a easier way to fix this other than adding quote everywhere ? http://gerrit.cloudera.org:8080/#/c/4078/1/bin/clean.sh File bin/clean.sh: PS1, Line 49: $IMPALA_ALL_LOGS_DIRS Won't this line suffer from the same problem ? PS1, Line 53: pushd $IMPALA_HOME/be Same problem ? PS1, Line 59: pushd $IMPALA_HOME/shell Same problem ? PS1, Line 71: rm -f $IMPALA_HOME/llvm-ir/impala*.ll : rm -f $IMPALA_HOME/be/generated-sources/impala-ir/* Are these lines also unsafe ? http://gerrit.cloudera.org:8080/#/c/4078/1/bin/run-all-tests.sh File bin/run-all-tests.sh: PS1, Line 101: LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}" Is this variable still used ? PS1, Line 137: ${IMPALA_HOME} Same problem ? PS1, Line 173: {IMPALA_EE_TEST_LOGS_DIR} Same problem ? PS1, Line 173: ${IMPALA_HOME}/ Same problem ? PS1, Line 196: ${IMPALA_HOME} Same. http://gerrit.cloudera.org:8080/#/c/4078/1/buildall.sh File buildall.sh: Line 248: "$IMPALA_HOME/bin/clean.sh" Feel free to ignore but it should be sufficient to just have the quote around ${IMPALA_HOME}. Same for other places in the file. PS1, Line 321: $IMPALA_LZO Same problem ? PS1, Line 323: $IMPALA_LZO Same problem ? -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id
Michael Ho has posted comments on this change. Change subject: IMPALA-3988: Only use first 96 bits of query id .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id
Michael Ho has posted comments on this change. Change subject: IMPALA-3988: Only use first 96 bits of query id .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/4065/7/be/src/util/uid-util.h File be/src/util/uid-util.h: PS7, Line 61: (1L << 32) - 1 IMHO, defining this as a constant like the previous patch seems to be better as it gives a semantic meaning to this value (e.g. instance_idx_mask). PS7, Line 66: -1 nit: missing space. -- To view, visit http://gerrit.cloudera.org:8080/4065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4004: Don't access nested types in test failpoints.py
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4074 Change subject: IMPALA-4004: Don't access nested types in test_failpoints.py .. IMPALA-4004: Don't access nested types in test_failpoints.py As part of fixing IMPALA-3692, the query in test_failpoints.py was updated to have a predicate on a string column in a parquet table. The update to the query was based on the failing query in the bug which happens to access nested columns. Apparently, this doesn't quite work with the legacy join and agg. This change fixes the test to also work with legacy join and agg. With the debug actions added in the fix of IMPALA-3692, it's not necessary to access nested types to reproduce the problem as long as there is a predicate on a string column. Change-Id: Idc5e67b9748a13fcd76ea5fe140e2e6b18e809b7 --- M tests/failure/test_failpoints.py 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4074/1 -- To view, visit http://gerrit.cloudera.org:8080/4074 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idc5e67b9748a13fcd76ea5fe140e2e6b18e809b7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id
Michael Ho has posted comments on this change. Change subject: IMPALA-3988: Only use first 96 bits of query id .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: PS6, Line 505: int Will there be a chance of overflow here ? Should we use int64_t instead ? Once intance_idx > (1 << 31) - 1, it will be negative so instance_idx > MAX_FRAGMENT_INSTANCE_IDX will always be false. We may end up wrapping and not catch the problem. PS6, Line 509: nit: wrong indentation http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/util/uid-util-test.cc File be/src/util/uid-util-test.cc: Line 42: } It may be worth having some negative test cases in which the instance_id exceeds the 32-bit width ? http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/util/uid-util.h File be/src/util/uid-util.h: PS6, Line 70: std::numeric_limits::max() Do we support up to (1L << 31) -2 number of instances ? Is there a reason why we cannot use all 32-bit for instance id ? PS6, Line 75: int Does this have the same overflow problem here ? -- To view, visit http://gerrit.cloudera.org:8080/4065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
Michael Ho has posted comments on this change. Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner .. Patch Set 3: Code-Review+2 Rebase. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 5: Code-Review+2 Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/3991/4/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 442: "Debug Action: MEM_LIMIT_EXCEEDED"); > this is fine with me, but any reason we don't use mem_tracker() (i.e. exec- Done http://gerrit.cloudera.org:8080/#/c/3991/4/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: PS4, Line 576: if requested by scanner > I think we should delete this part because it sounds like the function itse Done -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3991 to look at the new patch set (#5). Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. IMPALA-3962: Clean up scratch tuple batch on scan failures The parquet scanner doesn't clean up 'scratch_batch_' properly which causes it to process a partially filled scratch_batch_ if any of the column reader fails. This change cleans up the scratch batch if any of the parquet column readers fails. The clean up involves freeing the mem_pool of scratch_batch_ and setting number of tuples in scratch_batch_ to 0. This change also extends debug action to emulate the behavior of exceeding the query's memory limit. Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 --- M be/src/exec/exec-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/failure/test_failpoints.py 10 files changed, 76 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/5 -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
Michael Ho has posted comments on this change. Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4064/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS1, Line 304: StartNewRowBatch > I think we should disambiguate so that it's clear which variant is being us Done Line 341: RETURN_IF_ERROR(CommitRows(row_batch, num_row_to_commit)); > Doesn't CommitRows() call HdfsScanner::StartNewRowBatch() anyway? I believe it's calling HdfsParquetScanner::CommitRows() which doesn't call StartNewRowBatch(). -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet scanner .. IMPALA-3662: Don't double allocate tuples buffer in parquet scanner HdfsScanner::StartNewRowBatch() is called once per row batch by the parquet scanner to allocate a new row batch and tuple buffer. Similarly, a scratch batch is created for each row batch in HdfsParquetScanner::AssembleRows() which also contains the tuple buffer. In reality, only the tuple buffer in the scratch batch is used. So, the tuple buffer allocated by HdfsScanner::StartNewRowBatch() is unused memory for the parquet scanner. This change fixes the problem above by implementing HdfsParquetScanner::StartNewRowBatch() which creates a new row batch without allocating the tuple buffer. With this patch, the memory consumption when materializing very wide tuples is reduced by half. Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h 3 files changed, 14 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/2 -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/4064 Change subject: IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner .. IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner HdfsScanner::StartNewRowBatch() is called once per row batch by the parquet scanner to allocate a new row batch and tuple buffer. Similarly, a scratch batch is created for each row batch in HdfsParquetScanner::AssembleRows() which also contains the tuple buffer. In reality, only the tuple buffer in the scratch batch is used. So, the tuple buffer allocated by HdfsScanner::StartNewRowBatch() is unused memory for the parquet scanner. This change fixes the problem above by implementing HdfsParquetScanner::StartNewRowBatch() which creates a new row batch without allocating the tuple buffer. With this patch, the memory consumption when materializing very wide tuples is reduced by half. Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h 3 files changed, 14 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/1 -- To view, visit http://gerrit.cloudera.org:8080/4064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has uploaded a new patch set (#4). Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. IMPALA-3962: Clean up scratch tuple batch on scan failures The parquet scanner doesn't clean up 'scratch_batch_' properly which causes it to process a partially filled scratch_batch_ if any of the column reader fails. This change cleans up the scratch batch if any of the parquet column readers fails. The clean up involves freeing the mem_pool of scratch_batch_ and setting number of tuples in scratch_batch_ to 0. This change also extends debug action to emulate the behavior of exceeding the query's memory limit. Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 --- M be/src/exec/exec-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/failure/test_failpoints.py 10 files changed, 77 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/4 -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS3, Line 441: Status::MemLimitExceeded(); > shoudl this use mem_tracker()->SetMemLimitExceeded() since Tim is convertin Done http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS3, Line 350: column_readers_[0]->RowGroupAtEnd() > couldn't we have hit column 0's end of rowgroup, but not column 1's, which Good point. Reverted to the check in the old code. http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS3, Line 381: current > currently Done PS3, Line 383: nodes > node's Done -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3090: always log memory limit errors
Michael Ho has posted comments on this change. Change subject: IMPALA-3090: always log memory limit errors .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5ec5572b0e26898da352b7e6b11eb01c6edb2e5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/3991/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 343: context_->ClearStreams(); > We can also ask Alex about it, I believe he wrote this code. I believe it's always safe to do the two lines above even if advance_row_group_ == true. Line 373: if (context_->cancelled()) return Status::CANCELLED; > In case of parse error in AssembleRows(), we may not have called CommitRows Removed. If the scanner doesn't see the cancellation here, it's bound to see it at some point in CommitRows() if keeps producing row batches. Line 512: scratch_batch_->num_tuples)); > Need the same error handling in this path. Done Line 513: *skip_row_group = true; > Yes, missed that after rebase. Done -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3090: always log memory limit errors
Michael Ho has posted comments on this change. Change subject: IMPALA-3090: always log memory limit errors .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4049/1/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: PS1, Line 350: int64_t failed_allocation) Thanks for adding the comment. It may be worth setting the default value of 'failed_allocation' to 0 so some callsites of this function don't have to pass 0. -- To view, visit http://gerrit.cloudera.org:8080/4049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5ec5572b0e26898da352b7e6b11eb01c6edb2e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Michael Ho has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder .. Patch Set 4: (16 comments) Still going through the changes and digesting them. Some comments for now. http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: PS4, Line 249: SwitchToIoBuffers(_buffer) Not your change but I always find the inconsistency between returning bool vs Status a bit confusing. For instance, this function return Status while AddRow() passes the Status as argument. PS4, Line 325: total_build_rows 'total_build_rows' http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: PS4, Line 57: methods interface ? PS4, Line 152: Partition { I suppose this is the build-side counterpart of ProbePartition, right ? Can you please add a brief class comment about it ? PS4, Line 163: in memory size in-memory data structures PS4, Line 163: size byte size PS4, Line 175: he the PS4, Line 180: is_closed() Why cannot this be the same as ProbePartition::IsClosed() and use { return build_rows_ == NULL; } ? PS4, Line 290: then the PS4, Line 291: needed to destruct the Status Is this still true ? I believe we have added noexcept to the Status class already. Line 294: /// This status should used directly only by ProcessBuildBatch(). be used http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: PS4, Line 523: join_op_ == TJoinOp::RIGHT_ANTI_JOIN || nit: long line http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS4, Line 449: is_closed() nit: IsClosed() PS4, Line 459: transferred. ... and the partition is considered closed. http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/runtime/buffered-tuple-stream.h File be/src/runtime/buffered-tuple-stream.h: PS4, Line 230: got_buffer 'got_buffer' Line 231: /// false if the block could not be pinned and no error was encountered. And it's unknown if error was encountered. In which case, an error status will be returned. -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Michael Ho has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. IMPALA-3962: Clean up scratch tuple batch on scan failures The parquet scanner doesn't clean up scratch_batch_ properly which causes it to process a partially filled scratch_batch_ if any of the column reader fails. This change cleans up the scratch batch if any of the parquet column readers fails. The clean up involves freeing the mem_pool of scratch_batch_ and setting number of tuples in scratch_batch_ to 0. This change also extends debug action to emulate the behavior of exceeding the query's memory limit. Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 --- M be/src/exec/exec-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/failure/test_failpoints.py 10 files changed, 77 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/3 -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. IMPALA-3962: Clean up scratch tuple batch on scan failures The parquet scanner doesn't clean up scratch_batch_ properly which causes it to process a partially filled scratch_batch_ if any of the column reader fails. This change cleans up the scratch batch if any of the parquet column readers fails. The clean up involves freeing the mem_pool of scratch_batch_ and setting number of tuples in scratch_batch_ to 0. Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 --- M be/src/exec/exec-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/failure/test_failpoints.py 10 files changed, 75 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/2 -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 1: To answer the question of why the stress test hit this: the query was cancelled (I noticed that while debugging it last week but couldn't quite complete the whole picture). The cancellation was noticed eventually by HdfsParquetScanner::AssembleCollection() which eventually bubbles 'continue_execution == false' up the stack to the loop in AssembleRows(), causing this bug. So, it may indeed be a good idea to add some debug actions there for debug builds. -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/3991/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 343: context_->ClearStreams(); > why is this under !advance_row_group_? If we did this here, then it seems Yes. I am not 100% sure it's side-effect free (e.g. are these calls idempotent ?). Let me think more about this. Line 373: if (context_->cancelled()) return Status::CANCELLED; > why is this needed? won't the next GetNextInternal() notice this (inside Co In case of parse error in AssembleRows(), we may not have called CommitRows(). However, it may be worth some more rethinking of how the code should be structured so we can avoid this redundant check here. Line 513: *skip_row_group = true; > doesn't this path have the same bug? Yes, missed that after rebase. -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has posted comments on this change. Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. Patch Set 1: We probably need to create a malformed table with bad data in some columns. -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
Michael Ho has posted comments on this change. Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested collection. .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3201: buffer pool header only
Michael Ho has posted comments on this change. Change subject: IMPALA-3201: buffer pool header only .. Patch Set 1: Please feel free to address them in the next patch though. -- To view, visit http://gerrit.cloudera.org:8080/3992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id771dea2eb4c1aa13c30d59e8b184a7d1bca8d34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3201: buffer pool header only
Michael Ho has posted comments on this change. Change subject: IMPALA-3201: buffer pool header only .. Patch Set 1: Sorry to jump in late at this point but there are also some comments at https://gerrit.cloudera.org/#/c/2569 -- To view, visit http://gerrit.cloudera.org:8080/3992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id771dea2eb4c1aa13c30d59e8b184a7d1bca8d34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/3991 Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures .. IMPALA-3962: Clean up scratch tuple batch on scan failures The parquet scanner doesn't clean up scratch_batch_ properly which causes it to process a partially filled scratch_batch_ if any of the column reader fails. This change cleans up the scratch batch if any of the parquet column readers fails. The clean up involves freeing the mem_pool of scratch_batch_ and setting number of tuples in scratch_batch_ to 0. Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 --- M be/src/exec/hdfs-parquet-scanner.cc 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/1 -- To view, visit http://gerrit.cloudera.org:8080/3991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
Michael Ho has posted comments on this change. Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested collection. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3940/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 574: dst_batch->tuple_data_pool()->AcquireData(scratch_batch_->mem_pool(), false); > FreeAll() also works for both cases (I ran a private build to confirm), so FreeAll() sounds good. -- To view, visit http://gerrit.cloudera.org:8080/3940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs
Michael Ho has posted comments on this change. Change subject: IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs .. Patch Set 2: Added a new test in free-pool-test.cc -- To view, visit http://gerrit.cloudera.org:8080/3807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs
Hello Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3807 to look at the new patch set (#2). Change subject: IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs .. IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs This patch addresses a potential overflow in calculation FreePool::Rellocate() and its handling of zero-length allocations. This patch also adds code to gracefully handle malloc() failures when initializing/resizing hash tables. Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d --- M be/src/exec/hash-table.cc M be/src/runtime/free-pool-test.cc M be/src/runtime/free-pool.h 3 files changed, 16 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/3807/2 -- To view, visit http://gerrit.cloudera.org:8080/3807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Michael Ho has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS7, Line 162: if (scan_node_->runtime_state()->codegen_enabled()) { Is this check redundant as GetCodegenFn() should return NULL if codegen is disabled, right ? PS7, Line 585: int tuple_byte_size, bool has_filters Did you mean to use these to replace some constants ? They don't seem to be used. http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: PS7, Line 558: *write_complete_tuple_fn = codegen->FinalizeFunction(fn); : return Status::OK(); Doesn't this trigger the DCHECK in some code which check the returned function is not NULL if status.ok() ? In theory, FinalizeFunction() can return NULL if there is any error. http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: PS7, Line 63: nit: wrong indentation http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS7, Line 699: DCHECK(*write_aligned_tuples_fn != NULL); Please see comments in CodegenWriteAlignedTuples() about this DCHECK(). -- To view, visit http://gerrit.cloudera.org:8080/3774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys
Michael Ho has submitted this change and it was merged. Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration keys .. IMPALA-3829: OpenSession() logs errors on valid configuration keys Refactored OpenSession() to process the supplied configuration map in one loop. Call SetQueryOption() on normal configuration keys only. Other changes: - Compare config keys to "impala.doas.user" in case-insensitive manner. - New E2E test to check that setting query options still works after the change. Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025 Reviewed-on: http://gerrit.cloudera.org:8080/3808 Tested-by: Internal Jenkins Reviewed-by: Michael Ho <k...@cloudera.com> --- M be/src/service/impala-hs2-server.cc M tests/hs2/test_hs2.py 2 files changed, 39 insertions(+), 33 deletions(-) Approvals: Michael Ho: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
Michael Ho has posted comments on this change. Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested collection. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3940/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 574: dst_batch->tuple_data_pool()->AcquireData(scratch_batch_->mem_pool(), false); Sorry for the late review. If we are materializing a collection with empty tuples, do we actually need to transfer the memory pool of the scratch batch or can we free it at this point ? -- To view, visit http://gerrit.cloudera.org:8080/3940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys
Michael Ho has posted comments on this change. Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration keys .. Patch Set 3: Code-Review+2 Carry Henry's +2 forward. Attila cannot yet +2 at this point. -- To view, visit http://gerrit.cloudera.org:8080/3808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3946: fix MemPool integrity issues with empty chunks
Michael Ho has posted comments on this change. Change subject: IMPALA-3946: fix MemPool integrity issues with empty chunks .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/3838/2/be/src/runtime/mem-pool.cc File be/src/runtime/mem-pool.cc: PS2, Line 265: if (current_chunk_empty) DCHECK_EQ(chunks_[i].allocated_bytes, 0); Please add a comment stating that !current_chunk_empty doesn't imply the current chunk is not empty. It just means the caller isn't sure it's empty. May be it's better to rename it to 'check_current_chunk_empty' if it's not too verbose. -- To view, visit http://gerrit.cloudera.org:8080/3838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03ad12e5b2b63cbb55e5c52562416d73a4bd9346 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation
Michael Ho has posted comments on this change. Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation .. Patch Set 4: Code-Review+2 Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/3793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3946: fix MemPool integrity issues with empty chunks
Michael Ho has posted comments on this change. Change subject: IMPALA-3946: fix MemPool integrity issues with empty chunks .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/3838/1/be/src/runtime/mem-pool-test.cc File be/src/runtime/mem-pool-test.cc: PS1, Line 435: Empty chunk May be I misunderstood it but I think you meant the chunk from src isn't transferred. Whether the chunk is transferred should have nothing to do whether the chunk is empty or not, right ? http://gerrit.cloudera.org:8080/#/c/3838/1/be/src/runtime/mem-pool.cc File be/src/runtime/mem-pool.cc: PS1, Line 105: int first_free_idx = 0; IMHO, It may be slightly clearer to make the assignment of this variable more explicit: if (UNLIKELY(current_chunk_idx_ == -1)) { first_free_idx = 0; } else { DCHECK_GE(current_chunk_idx, 0); first_free_idx = current_chunk_idx_ + (chunks_[current_chunk_idx_] > 0); // Always start at current_chunk_idx + 1 as we know the current chunk cannot fit. for (int idx = current_chunk_idx_ + 1; idx < chunks_.size(); ++idx) { } } It would be great to also document that the reason there can be a list of free chunks after the current chunk is a result of calling Clear() in the past. That always confuses me when I read this function without reading other functions at the same time. PS1, Line 215: false Is there a reason why this cannot be !keep_current || src->GetFreeOffset() > 0 ? PS1, Line 260: if (current_chunk_empty) Can we keep the stricter check we had before if we modify line 215 ? -- To view, visit http://gerrit.cloudera.org:8080/3838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03ad12e5b2b63cbb55e5c52562416d73a4bd9346 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Michael Ho has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/3791/3/be/src/util/string-parser.h File be/src/util/string-parser.h: PS3, Line 360: int i = 3; : if (i + 5 <= len && strncasecmp(s + i, "inity", 5) == 0) { : i += 5; : } Feel free to ignore as the compiler will have figured it out anyway. These lines may be simplified as: int i =3; if (len >= 8 && strncasecmp(s, "infinity", 8) == 0) i = 8; -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
Michael Ho has posted comments on this change. Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement .. Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/constant-replacement.cc File be/src/codegen/constant-replacement.cc: PS1, Line 37: Value Do you think Constant* will be more appropriate ? Value* may be a bit generic. http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/constant-replacement.h File be/src/codegen/constant-replacement.h: Line 1: // Copyright 2016 Cloudera Inc. This header needs updating. http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS1, Line 829: /// nit: "//". It should be fine to declare this in the header file too. Not much harm in exposing it but up to you. PS1, Line 898: /// nit: "//" PS1, Line 900: fn_pass_manager->add(new ConstantReplacementPass(this)); Did you measure how costly this extra optimization pass is ? http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS1, Line 376: llvm::Value* GetBoolConstant nice. http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exec/hash-table-constant-replacer.cc File be/src/exec/hash-table-constant-replacer.cc: Line 1: // Copyright 2016 Cloudera Inc. This header needs updating. http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: PS1, Line 271: context->impl()->GetReturnPrecision() An observation when I went through the patch: if we can somehow generate FunctionContext as an LLVM constant, a lot of these call sites replacement won't be needed. That said, I am not advocating we should switch to that approach (at least not now). It's just an alternative. http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exprs/expr-constant-replacement.cc File be/src/exprs/expr-constant-replacement.cc: Line 1: // Copyright 2016 Cloudera Inc. This header needs updating. PS1, Line 41: ArgType It seems unnecessary to have a separate class for this. May be we can forgo the need of a separate class called ExprConstantReplacement and put everything in this file under the class ExprConstantReplacer ? Line 82: const vector& arg_types, LlvmCodeGen* codegen, nit: wrong indentation. Line 84: ArgTypeConstantReplacer replacer(return_type, arg_types); Referring to the previous comment about combining this with the LLVM constant pass: On the first glance, the code seems to still have two different patterns for doing constant replacement. One is this function and the other one is the LLVM constant pass. However, upon closer look, they are actually different. The replacement values used in this function depend on 'return_type' which may differ from functions to functions. On the other hand, the LLVM constant pass you added is for CpuInfo() whose replacement values will be the same for all functions given the same input. In some sense, this function is a local replacement pass while the LLVM pass is closer to a global replacement pass. It may be helpful to document the difference here as having these two patterns at the same time may be a bit confusing to decide which one to use when. Not sure if it's possible to DCHECK that this type of constant replacer is not used in the LLVM constant replacement pass as it doesn't seem to make sense ? I guess same thing can be said about hash_table_constant_replacer. -- To view, visit http://gerrit.cloudera.org:8080/3843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys
Michael Ho has posted comments on this change. Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration keys .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys
Michael Ho has posted comments on this change. Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration keys .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/3808/1/tests/hs2/test_hs2.py File tests/hs2/test_hs2.py: PS1, Line 81: open_session_req.configuration = {"impala.doas.user": ""} Is it worth adding a case sensitive test for impala.doas.user ? -- To view, visit http://gerrit.cloudera.org:8080/3808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Michael Ho has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3791/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS2, Line 2796: TestValue("CAST(' inf ' AS FLOAT)", TYPE_FLOAT, : numeric_limits::infinity()); : TestValue("CAST(' inf ' AS DOUBLE)", TYPE_DOUBLE, : numeric_limits::infinity()); > This doesn't seem to match what you wrote in the comment. Isn't leading whi Nevermind. Looks like the leading whitespace is removed before calling StringToFloatInternal(). -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Michael Ho has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/3774/3/be/src/exec/hdfs-parquet-scanner-ir.cc File be/src/exec/hdfs-parquet-scanner-ir.cc: PS3, Line 44: if (tuple_size == 0) { Given that if (tuple_size == 0) { ...} is handling an entire scratch patch in one shot, it seems that we won't be benefiting a lot from it by eliminating the if statement. The gain we get may be offset by the codegen time to optimize/compile this function at runtime. I would suggest cross-compiling the while loop below only. PS3, Line 61: while (scratch_tuple != scratch_tuple_end) { Can we factor the while loop (or the loop body only) out as a separate function and cross-compile it only ? http://gerrit.cloudera.org:8080/#/c/3774/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS3, Line 388: if (!node->runtime_state()->GetCodegen().ok()) { : return Status("Failed to get codegen"); : } RETURN_IF_ERROR(node->runtime_state()->GetCodegen()); PS3, Line 408: llvm::ConstantInt::get( : llvm::Type::getInt32Ty(codegen->context()), tuple_byte_size) codegen->GetIntConstant(sizeof(tuple_byte_size), tuple_byte_size) ? -- To view, visit http://gerrit.cloudera.org:8080/3774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/3807 Change subject: IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs .. IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs This patch addresses a potential overflow in calculation FreePool::Rellocate() and its handling of zero-length allocations. This patch also adds code to gracefully handle malloc() failures when initializing/resizing hash tables. Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d --- M be/src/exec/hash-table.cc M be/src/runtime/free-pool.h 2 files changed, 11 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/3807/1 -- To view, visit http://gerrit.cloudera.org:8080/3807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation
Michael Ho has posted comments on this change. Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/3793/2//COMMIT_MSG Commit Message: Line 16: cross-compilation as they won't benefit from it. > Also the Uuid() function. Done http://gerrit.cloudera.org:8080/#/c/3793/2/be/src/exprs/utility-functions.cc File be/src/exprs/utility-functions.cc: Line 39: StringVal UtilityFunctions::GenUuid(FunctionContext* ctx) { > Maybe it's worth commenting why this isn't cross-compiled. I don't think it Done -- To view, visit http://gerrit.cloudera.org:8080/3793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3793 to look at the new patch set (#3). Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation .. IMPALA-1112: Remove some unncessary code from cross-compilation This change stops including some boost library header files which pulls in other unnecessary boost library header files. This reduces the amount of cross-compiled code which needs to be materialized during codegen. This change also removes some UDF's Prepare() and Close() functions and UDF functions fromUtc(), toUtc() and uuid() from cross-compilation as they won't benefit from it. With this change, the bitcode module reduces from 2.12 MB to 1.86MB. Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af --- M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/common/init.cc M be/src/exprs/CMakeLists.txt M be/src/exprs/expr.cc M be/src/exprs/timestamp-functions-ir.cc A be/src/exprs/timestamp-functions.cc M be/src/exprs/timestamp-functions.h M be/src/exprs/timezone_db.cc A be/src/exprs/timezone_db.h M be/src/exprs/utility-functions-ir.cc A be/src/exprs/utility-functions.cc M be/src/exprs/utility-functions.h M be/src/runtime/mem-tracker.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/impala-http-handler.cc M be/src/statestore/statestore.cc M be/src/util/debug-util.cc 18 files changed, 331 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/3 -- To view, visit http://gerrit.cloudera.org:8080/3793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation .. IMPALA-1112: Remove some unncessary code from cross-compilation This change stops including some boost library header files which pulls in other unnecessary boost library header files. This reduces the amount of cross-compiled code which needs to be materialized during codegen. This change also removes some UDF's Prepare() and Close() functions and two UDF functions fromUtc() and toUtc() from cross-compilation as they won't benefit from it. With this change, the bitcode module reduces from 2.12 MB to 1.86MB. Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af --- M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/common/init.cc M be/src/exprs/CMakeLists.txt M be/src/exprs/expr.cc M be/src/exprs/timestamp-functions-ir.cc A be/src/exprs/timestamp-functions.cc M be/src/exprs/timestamp-functions.h M be/src/exprs/timezone_db.cc A be/src/exprs/timezone_db.h M be/src/exprs/utility-functions-ir.cc A be/src/exprs/utility-functions.cc M be/src/exprs/utility-functions.h M be/src/runtime/mem-tracker.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/impala-http-handler.cc M be/src/statestore/statestore.cc M be/src/util/debug-util.cc 18 files changed, 327 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/2 -- To view, visit http://gerrit.cloudera.org:8080/3793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has abandoned this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Abandoned Review is moved over to https://gerrit.cloudera.org/#/c/3792/ -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/3793 Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation .. IMPALA-1112: Remove some unncessary code from cross-compilation This change stops including some boost library header files which pulls in other unnecessary boost library header files. This reduces the amount of cross-compiled code which needs to be materialized during codegen. This change also removes some UDF's Prepare() and Close() functions and two UDF functions fromUtc() and toUtc() from cross-compilation as they won't benefit from it. With this change, the bitcode module reduces from 2.12 MB to 1.86MB. Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af --- M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/common/init.cc M be/src/exprs/CMakeLists.txt M be/src/exprs/expr.cc M be/src/exprs/timestamp-functions-ir.cc A be/src/exprs/timestamp-functions.cc M be/src/exprs/timestamp-functions.h M be/src/exprs/timezone_db.cc A be/src/exprs/timezone_db.h M be/src/exprs/utility-functions-ir.cc A be/src/exprs/utility-functions.cc M be/src/exprs/utility-functions.h M be/src/runtime/mem-tracker.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/impala-http-handler.cc M be/src/statestore/statestore.cc M be/src/util/debug-util.cc 18 files changed, 318 insertions(+), 212 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/1 -- To view, visit http://gerrit.cloudera.org:8080/3793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/3740/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS4, Line 553: referencd > referenced Done PS4, Line 554: so LLVM : /// cannot resolve references to them if they are not materialized. > this now seems redundant with the last sentence. delete. Done -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 1: Code-Review+2 Review is at https://gerrit.cloudera.org/#/c/3740/. Carry +2 over. -- To view, visit http://gerrit.cloudera.org:8080/3792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/3792 Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. IMPALA-3906: Materialize implicitly referenced IR functions With lazy materialization of IR functions in the LLVM module, there is an assumption that functions not referenced by IR functions used in the query don't have to be materialized and can have their linkage types changed to being externally defined. These unmaterialized functions may be referenced by global variables in the LLVM module and LLVM resolves these references to their definitions in the native Impalad binary. These global variables are mostly arrays containing references to other global constants or boost and Impala functions included in the cross-compiled code. When compiling Impalad with optimization, gcc may actually inline some of the functions which the global variables in the LLVM modules reference and LLVM may have linking error if these referenced IR functions are not materialized as it can no longer find their definitions in the Impalad binary. This patch fixes the problem by parsing all the global variables in the LLVM module during Impalad's initialization and recording all the referenced functions which aren't defined in the Impalad binary and make sure they are always materialized. A second problem which this patch fixes is that linking in external LLVM module (e.g. UDF IR created by a user) may implicitly materialize some functions in the external module. Normally, we would expect functions to be materialized through the GetFunction() interface and LinkModule() seems to be an exception. This patch updates LinkModule() to also parse the functions from the external module just like we do for unmaterialized functions in GetFunction(). Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/symbols-util-test.cc 3 files changed, 152 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/3792/1 -- To view, visit http://gerrit.cloudera.org:8080/3792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3740 to look at the new patch set (#4). Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. IMPALA-3906: Materialize implicitly referenced IR functions With lazy materialization of IR functions in the LLVM module, there is an assumption that functions not referenced by IR functions used in the query don't have to be materialized and can have their linkage types changed to being externally defined. These unmaterialized functions may be referenced by global variables in the LLVM module and LLVM resolves these references to their definitions in the native Impalad binary. These global variables are mostly arrays containing references to other global constants or boost and Impala functions included in the cross-compiled code. When compiling Impalad with optimization, gcc may actually inline some of the functions which the global variables in the LLVM modules reference and LLVM may have linking error if these referenced IR functions are not materialized as it can no longer find their definitions in the Impalad binary. This patch fixes the problem by parsing all the global variables in the LLVM module during Impalad's initialization and recording all the referenced functions which aren't defined in the Impalad binary and make sure they are always materialized. A second problem which this patch fixes is that linking in external LLVM module (e.g. UDF IR created by a user) may implicitly materialize some functions in the external module. Normally, we would expect functions to be materialized through the GetFunction() interface and LinkModule() seems to be an exception. This patch updates LinkModule() to also parse the functions from the external module just like we do for unmaterialized functions in GetFunction(). Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/symbols-util-test.cc 3 files changed, 153 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/4 -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS2, Line 144: Also ignore other global : // variables as InitializeLlvm() will go through all of them. > Not sure what this is saying. What does "other" mean here? Rephrased the comments. It means other global variables in the same module. The reason they should be avoided is that we will parse all global variable one-by-one any way so there is no need to parse other global variables referenced by the current global variable. PS2, Line 320: fn.isMaterializable() > at this point, what functions in new_module are !isMaterializable()? when m This is to avoid built-in functions and declarations. LLVM functions sometimes have bad names. Line 337: } > can there be global variables in the module that need to be inspected for f Good point. Code is updated to also parse the global variables in the source module. http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS2, Line 546: . > ... that aren't defined in the Impalad native code (they may have been inli Done PS2, Line 546: implicitly > what's implicit? aren't they explicitly referenced by the global variables? It's implicit wrt to GetFunction() but yes, it's explicitly referenced by the global variables. PS2, Line 548: so LLVM cannot resolve references to them if they are not materialized. : /// These functions are always materialized each time the module is loaded. > These functions are always materialized each time the module is loaded to e Done PS2, Line 550: gv_ref_functions_ > since these are a subset of global var referenced functions that we need in Done -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: Line 436: static bool IsDefinedInImpalad(const std::string fn); > Missing & to pass by reference Oops. Done. -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3740 to look at the new patch set (#3). Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. IMPALA-3906: Materialize implicitly referenced IR functions With lazy materialization of IR functions in the LLVM module, there is an assumption that functions not referenced by IR functions used in the query don't have to be materialized and can have their linkage types changed to being externally defined. These unmaterialized functions may be referenced by global variables in the LLVM module and LLVM resolves these references to their definitions in the native Impalad binary. These global variables are mostly arrays containing references to other global constants or boost and Impala functions included in the cross-compiled code. When compiling Impalad with optimization, gcc may actually inline some of the functions which the global variables in the LLVM modules reference and LLVM may have linking error if these referenced IR functions are not materialized as it can no longer find their definitions in the Impalad binary. This patch fixes the problem by parsing all the global variables in the LLVM module during Impalad's initialization and recording all the referenced functions which aren't defined in the Impalad binary and make sure they are always materialized. A second problem which this patch fixes is that linking in external LLVM module (e.g. UDF IR created by a user) may implicitly materialize some functions in the external module. Normally, we would expect functions to be materialized through the GetFunction() interface and LinkModule() seems to be an exception. This patch updates LinkModule() to also parse the functions from the external module just like we do for unmaterialized functions in GetFunction(). Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/symbols-util-test.cc 3 files changed, 127 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/3 -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 106: bool LlvmCodeGen::IsDefinedInImpalad(const Function* fn) { > Maybe the argument should just be the function name? Done Line 109: Status status = LibCache::instance()->GetSoFunctionPtr("", fn_name, _ptr, NULL, true); > Long line. Done PS1, Line 122: dyn_cast > I think you want cast<> instead of dyn_cast<> here and below since you don' Done Line 138: } else if (isa(val) || isa(val)) { > Maybe DCHECK that this is true? I think we want to know if another type of I added a DCHECK in a new else clause after this one. Please see if it makes sense to you. Line 147: Function* fn = fn_list[i]; > I don't think we need the index, so we can simplify to: Done PS1, Line 326: "picked" > "merged" Done Line 328: for (int i = 0; i < fn_names.size(); ++i) { > for (const string& fn_name: fn_names) { Done Line 372: codegen->MaterializeFunction(); > Do you have an idea of how many additional functions we're materialising? I It's adding 12 functions. There is slightly more overhead based on rough estimate from the profile.They will be removed in a patch for IMPALA-1112. _ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE23set_iso_extended_formatEv _ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14set_iso_formatEv _ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED0Ev _ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED2Ev _ZN5boost6detail15sp_counted_baseD0Ev _ZNK5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE9do_put_tmES7_RSt8ios_basecRK2tmSs _ZNK5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14do_put_specialES7_RSt8ios_basecNS0_14special_valuesE _ZN5boost6detail15sp_counted_baseD2Ev _ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14set_iso_formatEv _ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE23set_iso_extended_formatEv _ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED0Ev _ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED2Ev http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS1, Line 550: set > unordered_set? I believe std::set is a balanced binary tree that isn't very Done -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has posted comments on this change. Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS1, Line 326: "picked" "merged" -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/3740 Change subject: IMPALA-3906: Materialize implicitly referenced IR functions .. IMPALA-3906: Materialize implicitly referenced IR functions With lazy materialization of IR functions in the LLVM module, there is an assumption that functions not referenced by IR functions used in the query don't have to be materialized and can have their linkage types changed to being externally defined. These unmaterialized functions may be referenced by global variables in the LLVM module and LLVM resolves these references to their definitions in the native Impalad binary. These global variables are mostly arrays containing references to other global constants or boost and Impala functions included in the cross-compiled code. When compiling Impalad with optimization, gcc may actually inline some of the functions which the global variables in the LLVM modules reference and LLVM may have linking error if these referenced IR functions are not materialized as it can no longer find their definitions in the Impalad binary. This patch fixes the problem by parsing all the global variables in the LLVM module during Impalad's initialization and recording all the referenced functions which aren't defined in the Impalad binary and make sure they are always materialized. A second problem which this patch fixes is that linking in external LLVM module (e.g. UDF IR created by a user) may implicitly materialize some functions in the external module. Normally, we would expect functions to be materialized through the GetFunction() interface and LinkModule() seems to be an exception. This patch updates LinkModule() to also parse the functions from the external module just like we do for unmaterialized functions in GetFunction(). Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/symbols-util-test.cc 3 files changed, 121 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/1 -- To view, visit http://gerrit.cloudera.org:8080/3740 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com>
Re: [VOTE] Apache Impala Bylaws
+1 (binding) On Fri, Jul 22, 2016 at 12:46 PM, Jim Applewrote: > This is a vote on the following proposal for bylaws: > > https://gerrit.cloudera.org/#/c/3669/2 > > The vote is to be done by "Lazy Consensus". Active PMC members, > according to http://incubator.apache.org/projects/impala.html, may > vote. The vote will be open 72 hours and will pass if there are "3 > binding +1 votes and more binding +1 votes than -1 votes." > > + > > I am not on the PPMC, so my vote is non-binding. Here it is anyway, as > according to our draft bylaws, "Non binding votes are still useful for > those with binding votes to understand the perception of an action in > the wider Impala community." > > (Non-binding) +1. > > My reasoning is that these bylaws are probably not utterly bonkers, > since they are mostly what Hadoop uses, and they are easy to change if > anyone finds something problematic. Additionally, since many of us in > the Impala community are new to The Apache Way, having a document that > spells things out (like how voting works) will, I hope, serve as a > helpful foundation. > > +++ > > Here is a plain-text copy of the patch for mailing-list archival purposes: > > +++ > > Apache Impala (incubating) Project Bylaws > > Introduction > > This document defines the bylaws under which the Apache Impala > (incubating) project operates. It defines the roles and > responsibilities of the project, who may vote, how voting works, how > conflicts are resolved, etc. > > Impala is a project of the Apache Software Foundation. The foundation > holds the trademark on the name "Impala" and copyright on Apache code > including the code in the Impala codebase. The foundation FAQ explains > the operation and background of the foundation. > > Impala is typical of Apache projects in that it operates under a set > of principles, known collectively as the "Apache Way". If you are new > to Apache development, please refer to the Incubator project for more > information on how Apache projects operate. > > Roles and Responsibilities > > Apache projects define a set of roles with associated rights and > responsibilities. These roles govern what tasks an individual may > perform within the project. The roles are defined in the following > sections > > Users > The most important participants in the project are people who use our > software. > > Users contribute to the Apache projects by providing feedback to > developers in the form of bug reports and feature suggestions. As > well, users participate in the Apache community by helping other users > on mailing lists and user support forums. > > Contributors > All of the volunteers who are contributing time, code, documentation, > or resources to the Impala Project. A contributor that makes > sustained, welcome contributions to the project may be invited to > become a Committer, though the exact timing of such invitations > depends on many factors. > > Committers > The project's Committers are responsible for the project's technical > management. Committers have write access to the project's version > control repositories. Committers may cast binding votes on any > technical discussion. > > Committer access is by invitation only and must be approved by > consensus approval of the active PMC members. A Committer is > considered emeritus by their own declaration or by not contributing in > any form to the project for over six months. An emeritus committer may > request reinstatement of commit access from the PMC. Such > reinstatement is subject to consensus approval of active PMC members. > > Significant, pervasive features may be developed in a speculative > branch of the repository. The PMC may grant commit rights on the > branch to its consistent contributors for the duration of the > initiative. Branch committers are responsible for shepherding their > feature into an active release and do not cast binding votes or vetoes > in the project. > > All Apache committers are required to have a signed Contributor > License Agreement (CLA) on file with the Apache Software Foundation. > There is a Committer FAQ which provides more details on the > requirements for Committers > > A committer who makes a sustained contribution to the project may be > invited to become a member of the PMC. The form of contribution is not > limited to code. It can also include code review, helping out users on > the mailing lists, documentation, testing, etc. > > Release Manager > A Release Manager (RM) is a committer who volunteers to produce a > Release Candidate according to HowToRelease. The RM shall publish a > Release Plan on the dev@ list stating the branch from which they > intend to make a Release Candidate, at least one week before they do > so. The RM is responsible for building consensus around the content of > the Release Candidate, in order to achieve a successful Product > Release vote. > > Project Management Committee > The Project Management Committee (PMC) is responsible
[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
Michael Ho has posted comments on this change. Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close() .. Patch Set 5: Code-Review+2 http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/3754/ Verified the new patch is fine with the private ASAN run above. Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/3630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
Michael Ho has posted comments on this change. Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close() .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/3630/4//COMMIT_MSG Commit Message: PS4, Line 23: ClearlStreams > typo Fixed. http://gerrit.cloudera.org:8080/#/c/3630/4/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS4, Line 194: decompression_type_ > it looks like this is only set in ProcessSplit() if we found a tuple starti Actually, that's a good point. The default value of decompression_type_ is THdfsCompression::NONE so in the case in which no tuple is found in the scan range, we may mistakenly account it towards the wrong compression type. Given that we will keep the stream object alive till the end now, we can go ahead and restore the old code here. -- To view, visit http://gerrit.cloudera.org:8080/3630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3630 to look at the new patch set (#5). Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close() .. IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close() A recent commit changed the ownership of the Stream object to its owning ScannerContext. After that change, a Stream object will be destroyed in ScannerContext::ReleaseCompletedResources() if the parameter 'done' is true. That usually happens in the Close() function of a scanner. However, for the text scanner, the Stream object can be destroyed after handling compressed data before Close() is called. In that case, the cached handle to the Stream object is invalid when it's referenced in Close() to access the compression codec of the Stream object. This change fixes the above problem by not deleting the stream objects in ScannerContext::ReleaseCompletedResources(). Instead a new function ScannerContext::ClearStreams() is added for that purpose and it's invoked in HdfsScanner::Close() to release all the stream objects. This avoids other use-after-free problems in the code. Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h 6 files changed, 29 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/30/3630/5 -- To view, visit http://gerrit.cloudera.org:8080/3630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com>