Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Michael Ho
+1

On Tue, Oct 17, 2017 at 7:25 PM, Thomas Tauber-Marshall <
tmarsh...@cloudera.com> wrote:

> +1
>
> On Tue, Oct 17, 2017 at 9:12 PM Bharath Vissapragada <
> bhara...@cloudera.com>
> wrote:
>
> > +1
> >
> > On Tue, Oct 17, 2017 at 7:10 PM, Mostafa Mokhtar 
> > wrote:
> >
> > > +1
> > >
> > > Thanks
> > > Mostafa
> > >
> > > > On Oct 17, 2017, at 7:09 PM, Brock Noland  wrote:
> > > >
> > > > +1
> > > >
> > > >> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker 
> wrote:
> > > >> +1
> > > >>
> > > >>> On Oct 17, 2017 19:07, "Jim Apple"  wrote:
> > > >>>
> > > >>> Following our discussion
> > > >>> https://lists.apache.org/thread.html/
> 2f5db4788aff9b0557354b9106c032
> > > >>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
> > > >>> , I propose that we graduate to a TLP. According to
> > > >>> https://incubator.apache.org/guides/graduation.html#
> > > >>> community_graduation_vote
> > > >>> this is not required, and https://impala.apache.org/bylaws.html
> does
> > > not
> > > >>> say whose votes are "binding" in a graduation vote, so all
> community
> > > >>> members are welcome to vote.
> > > >>>
> > > >>> This will remain open 72 hours. I will be notifying
> general@incubator
> > > it
> > > >>> is
> > > >>> occurring.
> > > >>>
> > > >>> This is my +1.
> > > >>>
> > >
> >
>



-- 
Thanks,
Michael


Re: jenkins.impala.io maintenance

2017-08-07 Thread Michael Ho
Jenkins plugins have been updated. Aborted GVD jobs were restarted.

On Mon, Aug 7, 2017 at 5:25 PM, Michael Ho <k...@cloudera.com> wrote:

> jenkins.impala.io need updates for some plugins to address a new security
> advisory.
>
> It will be put into maintenance mode at 5:35pm PST so no new jobs can be
> submitted. The upgrade will happen after all pending jobs complete. Once
> the upgrade completes, another email will be sent out.
>
> Please speak up if you have any objection to the above.
>
> --
> Thanks,
> Michael
>



-- 
Thanks,
Michael


Re: incubator-impala github repo is gone ?

2017-04-28 Thread Michael Ho
https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git is still
around.

On Fri, Apr 28, 2017 at 11:16 AM, Michael Ho <k...@cloudera.com> wrote:

> https://github.com/apache/incubator-impala seems to be empty now.
>
> Just filed https://issues.apache.org/jira/browse/INFRA-14038
>
> --
> Thanks,
> Michael
>



-- 
Thanks,
Michael


incubator-impala github repo is gone ?

2017-04-28 Thread Michael Ho
https://github.com/apache/incubator-impala seems to be empty now.

Just filed https://issues.apache.org/jira/browse/INFRA-14038

-- 
Thanks,
Michael


Re: GVO request

2017-03-13 Thread Michael Ho
Done.

On Mon, Mar 13, 2017 at 2:35 PM, Zachary Amsden 
wrote:

> Looking to start a verification run for
> https://gerrit.cloudera.org/#/c/6275/
>
> Thanks in advance,
>
>  - Zach
>



-- 
Thanks,
Michael


Re: gerrit email notifications

2017-02-08 Thread Michael Ho
I didn't get email notification for https://gerrit.cloudera.org/#/c/5950/
either.

On Wed, Feb 8, 2017 at 10:40 PM, Bharath Vissapragada  wrote:

> https://gerrit.cloudera.org/#/c/5792/
>
> For my last two patch sets, I didn't see an email.
>
> On Wed, Feb 8, 2017 at 10:38 PM, Alex Behm  wrote:
>
> > Do you have a recent example?
> >
> > On Wed, Feb 8, 2017 at 10:33 PM, Bharath Vissapragada <
> > bhara...@cloudera.com
> > > wrote:
> >
> > > Is anyone else not receiving email notifications for new patch sets or
> is
> > > it just me?
> > >
> >
>



-- 
Thanks,
Michael


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 2:

Patch #2 reverts to using the boost library condition variable for now. Given 
the wide spread usage of it in the existing code base, it seems a bit too 
distracting to replace all of them in this patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..

IMPALA-4026: Implement double-buffering for BlockingQueue.

With recent changes to improve the parquet scanner's efficency,
row batches are produced more quickly, leading to higher contention
in the blocking queue shared between scanner threads and the scan
node. The contention happens between different producers (i.e.
the scanner threads) and also to a lesser extent, between the
scanner threads and the scan node.

This change addresses the contention between the scanner threads
and the scan node by splitting the queue into a 'read_list_' and
a 'write_list_'. The consumers will consume from 'read_list_' until
it's exhausted while the producers will enqueue into 'write_list_'
until it's full. When 'read_list_' is exhausted, the consumer will
atomically swap the 'read_list_' with 'write_list_'. This reduces
the contention/overhead in two ways: (1) 'read_list_' and 'write_list_'
are protected by two different locks so consumer only contends for the
write lock when 'read_list_' is empty. (2) the consumer only signals
the producer after an entire 'read_list_' is consumed instead of
signalling it per entry in the 'read_list_'.

With this change, the regression in primitive_filter_bigint_non_selective
went from 1.6s to 0.8s, improving by 50%.

Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
---
M be/src/util/blocking-queue.h
1 file changed, 89 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/4350/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 1:

(3 comments)

It appears to me that primitive_conjunct_odering_5 and 
primitive_conjunct_ordering_1 have quite a bit of variance. I compared the 
result at the following two baseline runs:

http://sandbox.jenkins.cloudera.com/job/impala-private-perf-16node/234/
at commit 1a83f8bbe871df0527923e6f131a295270e54d33

http://sandbox.jenkins.cloudera.com/job/impala-private-perf-16node/233/
at commit 4fd25114d2ca23205396af1c16dcde3d3bec69fb

The result for conjunt_ordering_5 ranges from 27.16 to 44.45 while the 
reference baseline is 38.93. Similarly, conjunct_ordering_1 ranges from 10.95 
to 11.08 while the reference baseline is  9.84. My baseline should be based on 
run 233:

+-++---++-++---++-+---+
| Workload| Query  
| File Format   | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base 
StdDev(%) | Num Clients | Iters |
+-++---++-++---++-+---+
| TARGETED-PERF(_300) | primitive_filter_string_selective  
| parquet / none / none | 1.09   | 0.99|   +10.31%  |   2.59%   |   
2.49%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_2  
| parquet / none / none | 73.94  | 70.22   |   +5.29%   |   3.50%   |   
0.29%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_2 
| parquet / none / none | 6.01   | 5.73|   +5.04%   |   0.07%   |   
0.44%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_1 
| parquet / none / none | 1.70   | 1.62|   +4.83%   |   1.26%   |   
3.04%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_exchange_broadcast   
| parquet / none / none | 84.30  | 81.13   |   +3.90%   |   0.67%   |   
0.35%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_string_non_selective  
| parquet / none / none | 1.04   | 1.01|   +2.31%   |   2.44%   |   
0.21%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_groupby_bigint_lowndv
| parquet / none / none | 3.62   | 3.57|   +1.44%   |   2.10%   |   
0.71%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_groupby_bigint_pk
| parquet / none / none | 89.36  | 88.84   |   +0.59%   |   2.64%   |   
1.25%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_3 
| parquet / none / none | 58.96  | 58.61   |   +0.59%   |   0.43%   |   
0.08%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_orderby_bigint   
| parquet / none / none | 30.72  | 30.64   |   +0.26%   |   0.00%   |   
0.00%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_decimal_selective 
| parquet / none / none | 0.92   | 0.92|   +0.14%   |   0.06%   |   
0.53%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_groupby_bigint_highndv   
| parquet / none / none | 23.84  | 23.82   |   +0.08%   |   0.67%   |   
0.31%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_empty_build_join_1   
| parquet / none / none | 13.31  | 13.32   |   -0.10%   |   1.04%   |   
0.54%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_string_like   
| parquet / none / none | 5.77   | 5.84|   -1.28%   |   3.07%   |   
2.55%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_4  
| parquet / none / none | 1.17   | 1.18|   -1.33%   |   1.30%   |   
0.01%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_shuffle_join_one_to_many_string_with_groupby 
| parquet / none / none | 228.99 | 232.12  |   -1.35%   |   0.29%   |   
1.01%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_3  
| parquet / none / none | 1.53   | 1.55|   -1.87%   |   0.37%   |   
4.97%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_orderby_all  
| parquet / none / none | 108.38 | 110.56  |   -1.97%   |   0.73%   |   
1.21%| 1   | 3 |
| TARGETED-PERF(_300

[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG
Commit Message:

Line 29: BlockingQueue to using POSIX pthread primitives instead of boost
> I meant to say boost::condition_variable::wait() in the previous comment, n
Btw, this overhead seems to be related to the thread interruption support in 
boost library 
(https://www.justsoftwaresolutions.co.uk/threading/thread-interruption-in-boost-thread-library.html).
 Given the way we shut down the blocking queue, I am not sure if we need this. 
May be we can compile it out of the boost library.


-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Chen Huang <paulhuan...@utexas.edu>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG
Commit Message:

Line 29: BlockingQueue to using POSIX pthread primitives instead of boost
> In one of the backtrace from VTune, there was about 25% of the time of pthr
I meant to say boost::condition_variable::wait() in the previous comment, not 
pthread_cond_wait().


-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4350/1//COMMIT_MSG
Commit Message:

Line 29: BlockingQueue to using POSIX pthread primitives instead of boost
> This seems strange. I looked at ./boost_1_57_0/boost/thread/pthread/mutex.h
In one of the backtrace from VTune, there was about 25% of the time of 
pthread_cond_wait() spent in the destructor of interruption_checker().

Similarly, there is an extra mutex_lock() in notify_one() and notify_all(). 
It's unclear what these extra checks in boost library are buying us.


Line 45: | TARGETED-PERF(_300) | primitive_conjunct_ordering_1  
| parquet / none / none | 10.72  | 9.84|   +8.95%   |   2.57%   
|   0.53%| 1   | 3 |
> Any idea why these regressed? Might be good to understand if there's someth
Not sure if this is related to this patch. The nightly run from last night also 
shows similar (and in fact larger) level of regression:

http://sandbox.jenkins.cloudera.com/job/impala-workload-runner-16node-cdh5/1763/


-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1616: Improve the Memory Limit Exceeded error report

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1616: Improve the Memory Limit Exceeded error report
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4335/3/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

PS3, Line 250:  Fragment
fragment (no cap). Also space after ":" as Tim suggested.


-- 
To view, visit http://gerrit.cloudera.org:8080/4335
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb4e0c359d889938b4c351771ba539850bdb95ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4008: don't bake in hash table and hash join pointers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS2, Line 317: uint8_t*
> The return value has to be const if the input argument is const though. I t
I see. Makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/4326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-09 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4350

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..

IMPALA-4026: Implement double-buffering for BlockingQueue.

With recent changes to improve the parquet scanner's
efficency, row batches are produced more quickly, leading
to higher contention in the blocking queue shared between
scanner threads and the scan node. The contention happens
between different producers (i.e. the scanner threads) and
also to a lesser extent, between scanner threads and scan
node.

This change addresses the contention between scanner threads
and scan node by splitting the queue into a 'read_list_' and
a 'write_list_'. The consumers will consume from 'read_list_'
until it's exhausted while the producers will enqueue into
'write_list_' until it's full. When 'read_list_' is exhausted,
the consumer will atomically swap the 'read_list_' with
'write_list_'. This reduces the contention/overhead in two ways:
(1) 'read_list_' and 'write_list_' are protected by two different
locks so consumer only contends for the write lock when
'read_list_' is empty. (2) the consumer only signals the producer
after an entire 'read_list_' is consumed instead of signalling
it per entry in the 'read_list_'. This change also converts
BlockingQueue to using POSIX pthread primitives instead of boost
library which introduces some unncessary overhead (as observed
from VTune profiles).

With this change, the regression in primitive_filter_bigint_non_selective
went from 1.6s to 0.8s, improving by 50%.

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 34.74   | -4.56% | 9.75 
  | -4.50% |
+-+---+-++++

+-++---++-++---++-+---+
| Workload| Query  
| File Format   | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base 
StdDev(%) | Num Clients | Iters |
+-++---++-++---++-+---+
| TARGETED-PERF(_300) | primitive_conjunct_ordering_1  
| parquet / none / none | 10.72  | 9.84|   +8.95%   |   2.57%   |   
0.53%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_string_selective  
| parquet / none / none | 1.09   | 1.01|   +7.42%   |   2.59%   |   
4.91%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_2 
| parquet / none / none | 6.01   | 5.71|   +5.38%   |   0.07%   |   
0.96%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_string_non_selective  
| parquet / none / none | 1.04   | 0.99|   +4.89%   |   2.44%   |   
2.46%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_conjunct_ordering_5  
| parquet / none / none | 40.60  | 38.93   |   +4.29%   |   3.50%   |   
1.02%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_decimal_selective 
| parquet / none / none | 0.92   | 0.88|   +3.70%   |   0.06%   |   
2.69%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_1 
| parquet / none / none | 1.70   | 1.66|   +2.74%   |   1.26%   |   
1.43%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_empty_build_join_1   
| parquet / none / none | 13.31  | 13.07   |   +1.79%   |   1.04%   |   
0.98%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_groupby_bigint_lowndv
| parquet / none / none | 3.62   | 3.59|   +0.73%   |   2.10%   |   
0.04%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_filter_string_like   
| parquet / none / none | 5.77   | 5.74|   +0.45%   |   3.07%   |   
1.76%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_broadcast_join_3 
| parquet / none / none | 58.96  | 58.73   |   +0.39%   |   0.43%   |   
0.27%| 1   | 3 |
| TARGETED-PERF(_300) | primitive_groupby_bigint_highndv   
| parquet / none / none | 23.84  | 23.78

[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers

2016-09-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4008: don't bake in hash table and hash join pointers
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 185: uint32_t HashTableCtx::HashVariableLenRow(
> The whitespace decisions were made by clang-format. It does seem a little d
Can we configure clang-format to not format the code this way ? I find such 
inconsistency very annoying but that's just me.


PS2, Line 317: uint8_t*
> Some of the callsites need it to return a non-const pointer, e.g. EvalRow()
Hmm...I think I meant the first argument, not the return value.


-- 
To view, visit http://gerrit.cloudera.org:8080/4326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: don't bake in hash table and hash join pointers

2016-09-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4008: don't bake in hash table and hash join pointers
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 185: uint32_t HashTableCtx::HashVariableLenRow(
nit: this new line seems to make the formatting a bit inconsistent with say 
line 163. Same for other places in this file.


PS2, Line 317: uint8_t*
const uint8_t*


PS2, Line 322: ExprValueNullPtr(
This function seems to have only one caller (ExprValueNull()) left. I find the 
asymmetry between ExprValuePtr() and ExprValueNullPtr() misleading. How about 
inlining the logic directly into the only caller ?


Line 967: return Status(
Wondering why the extra new line here ?


http://gerrit.cloudera.org:8080/#/c/4326/2/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 158: return 
expr_values_cache_.ExprValuePtr(expr_values_cache_.cur_expr_values(), expr_idx);
nit: long line


PS2, Line 406: i
missing space ?


PS2, Line 407:  
extra space


Line 419:   bool IR_NO_INLINE EvalProbeRow(
nit: this extra new line seems a bit inconsistent with the rest of the files 
(e.g. line 432).


PS2, Line 425: rows
a row ?


Line 429:   /// 'expr_values_null'
Extra new line again. Why not continue the next line here ?


PS2, Line 436:  
nit: extra space


PS2, Line 446: IR_ALWAYS_INLINE 
ALWAYS_INLINE ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie353666dbb5c958f0094d169306fe930ec3014c5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3815: clean up cross-compiled comparator

2016-09-06 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3815: clean up cross-compiled comparator
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I058917da2c13ba41d6ff7fefbb761606344312ab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement

2016-09-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
..


Patch Set 2:

(16 comments)

The change looks good. Mostly comments are about clarification in functions' 
comments.

http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/codegen/constant-replacement.h
File be/src/codegen/constant-replacement.h:

PS2, Line 47:  we often want to ignore non-constant arguments. I.e. NO_ARG 
Can this be shortened to:

Note that NO_ARG doesn't imply ...


PS2, Line 75: replace
be replaced


PS2, Line 88: 10
a LLVM ConstantInt 10 ?


Line 93: /// will then be called with name = "some_function" and arg = 24. If 
the argument is not
It may help to point out that ReplaceOneArg() is usually used if  replacement 
constant value is derived from the input constant arg.


PS2, Line 104: constant value 
LLVM constant


PS2, Line 106: ReplaceNoArgs
Should ReplaceNoArgs() and ReplaceOneArgs() be protected ?


PS2, Line 112:  = NO_ARG
!= NO_ARG ?


PS2, Line 113: value
integral value


PS2, Line 114: constant value
LLVM constant.


PS2, Line 116:  Only integral constant args for now.
Currently only support integral constant args.


Line 150: std::vector result;
Do we expect this to be usually called once ? It appears that we could have 
cached this in a set when AddReplacement() is called instead of computing it 
every time ?


http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS2, Line 798: fn_map.find(callee->getName())
fn_map.find(callee->getName()) may be evaluated once in this statement:

if (callee != NULL) {
  unordered_map<string, ReplaceableFunction>::const_iterator fn_map_entry = 
fn_map.find(callee->getName());
  if (fn_map_entry == fn_map.end()) continue;
   ...
   
}


PS2, Line 1014: !fn.isDeclaration()
!fn.isDeclaration() && !fn.isIntrinsic() ?


http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS2, Line 644: int return_precision = \
 : context->impl()->GetReturnPrecision(); \
one line ?


http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/exprs/expr-constant-replacement.cc
File be/src/exprs/expr-constant-replacement.cc:

Line 41:   const vector& arg_types) :
nit: indentation


http://gerrit.cloudera.org:8080/#/c/3843/2/be/src/runtime/types.h
File be/src/runtime/types.h:

PS2, Line 238: inline int IR_ALWAYS_INLINE
ALWAYS_INLINE int


-- 
To view, visit http://gerrit.cloudera.org:8080/3843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4051: Avoid unnecessary copy of RowDescriptor into RowBatch

2016-08-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4051: Avoid unnecessary copy of RowDescriptor into 
RowBatch
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4181/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS2, Line 384: const RowDescriptor& row_desc_;
Is this always safe ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27c05090ee8b9dd9d650a5587a3ae42839aa0008
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4174/4/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 513:   /// the query option 'NUM_SCANNER_THREADS' is not set.
> explain that it is NUM_SCANNER_THREADS if that option is set, and explain w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..


Patch Set 5: Code-Review+2

Carry +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4174

to look at the new patch set (#5).

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..

IMPALA-2831: Bound the number of scanner threads per scan node.

Our current code base allows a scan node to spin up as many as
3x the number of logical cpu cores of scanner threads. However,
the scanner threads are cpu bound so there is diminishing return
for starting more scanner threads than the number of logical cores.
In fact, it may be detrimental due to context switching overhead.

This change bounds the number of scanner threads spun up by a scan
node to the number of logical cpu cores unless the query option
'num_scanner_threads' is set. The total number of available thread
tokens is unchanged. With this change, the peak memory usage of the
following query on a single node Impala cluster running on a machine
with 8 logical cores reduces from 287MB to 101MB.

select count(*) from tpch100_parquet.lineitem where l_orderkey > 20

The reduction comes mostly from the fewer outstanding IO buffers.
The IO for scan ranges will be scheduled by the scanner threads
which pick them up. There will be at least an IO buffer of 8 to 16MB
associated with each scan range. So, more threads we start up,
more memory will be consumed by the IO buffers, leading to the
higher peak memory usages.

Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
2 files changed, 23 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4174/1//COMMIT_MSG
Commit Message:

Line 20: with 8 logical cores reduces from 287MB to 101MB.
> could you add a little more explanation about why the mem usage goes down (
Done


http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 906:   if (!ranges.empty()) 
ThreadTokenAvailableCb(runtime_state_->resource_pool());
> and isn't this already handled by:
As discussed offline, 'progress_' doesn't take into account of file format so 
it will be non-zero. This ends up causing a bunch of scanner threads to be 
started. In the worst case, those threads may just quit when they realize they 
cannot get a scan range. I find this behavior a bit confusing.


-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#4).

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..

IMPALA-2831: Bound the number of scanner threads per scan node.

Our current code base allows a scan node to spin up as many as
3x the number of logical cpu cores of scanner threads. However,
the scanner threads are cpu bound so there is diminishing return
for starting more scanner threads than the number of logical cores.
In fact, it may be detrimental due to context switching overhead.

This change bounds the number of scanner threads spun up by a scan
node to the number of logical cpu cores unless the query option
'num_scanner_threads' is set. The total number of available thread
tokens is unchanged. With this change, the peak memory usage of the
following query on a single node Impala cluster running on a machine
with 8 logical cores reduces from 287MB to 101MB.

select count(*) from tpch100_parquet.lineitem where l_orderkey > 20

The reduction comes mostly from the fewer outstanding IO buffers.
The IO for scan ranges will be scheduled by the scanner threads
which pick them up. There will be at least an IO buffer of 8 to 16MB
associated with each scan range. So, more threads we start up,
more memory will be consumed by the IO buffers, leading to the
higher peak memory usages.

Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
2 files changed, 21 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..

IMPALA-2831: Bound the number of scanner threads per scan node.

Our current code base allows a scan node to spin up as many as
3x the number of logical cpu cores of scanner threads. However,
the scanner threads are cpu bound so there is diminishing return
for starting more scanner threads than the number of logical cores.
In fact, it may be detrimental due to context switching overhead.

This change bounds the number of scanner threads spun up by a scan
node to the number of logical cpu cores unless the query option
'num_scanner_threads' is set. The total number of available thread
tokens is unchanged. With this change, the peak memory usage of the
following query on a single node Impala cluster running on a machine
with 8 logical cores reduces from 287MB to 101MB.

select count(*) from tpch100_parquet.lineitem where l_orderkey > 20

The reduction comes mostly from the fewer outstanding IO buffers.
The IO for scan ranges will be scheduled by the scanner threads
which pick them up. There will be at least an IO buffer of 8 to 16MB
associated with each scan range. So, more threads we start up,
more memory will be consumed by the IO buffers, leading to the
higher peak memory usages.

Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
2 files changed, 23 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

PS1, Line 738: The scanner thread will yield after completing a scan range so 
the main thread
 : // also gets to run.
> I don't understand what this is trying to say, given that you also have the
What I was trying to convey was that the number of scanner threads being the 
same as the number of logical cores may starve the main thread. While it's true 
that OS' preemption will ensure some chances for the main thread to run, this 
comment is to point out that the scanner thread also active yields the cpu so 
other threads get to run too. Let me rephrase it (or remove it).


Line 740: 
runtime_state_->resource_pool()->set_max_quota(CpuInfo::num_cores() + 1);
> hmm doesn't this change the number of threads available for the join side h
Yes, good point. So, the way we implemented 'NUM_SCANNER_THREADS' query option 
is broken all along. Let's not touch the max_quota at all for this query option 
and check it in another way. Please wait for the next update of this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..

IMPALA-2831: Bound the number of scanner threads per scan node.

Our current code base allows a scan node to spin up as many as
3x the number of logical cpu cores of scanner threads. However,
the scanner threads are cpu bound so there is diminishing return
for starting more scanner threads than the number of logical cores.
In fact, it may be detrimental due to context switching overhead.

This change bounds the number of scanner threads spun up by a scan
node to the number of logical cpu cores unless the query option
'num_scanner_threads' is set. The total number of available thread
tokens is unchanged. With this change, the peak memory usage of the
following query on a single node impala cluster running on a machine
with 8 logical cores reduces from 287MB to 101MB.

select count(*) from tpch100_parquet.lineitem where l_orderkey > 20

Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
---
M be/src/exec/hdfs-scan-node.cc
1 file changed, 10 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4174/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 262: RETURN_IF_ERROR(HdfsParquetScanner::IssueInitialRanges(this,
> Is the order important? Would be helpful to have a comment here if there is
Yes, mostly for slight preference for parquet files. This mostly makes up for 
the difference in behavior due to the change in line 906. Previously, we would 
have spun up those scanner threads (unintentionally) when calling 
IssueInitalRanges() for Text file format. We also start issuing the initial IO 
ranges slightly earlier now for parquet files.


Line 740: 
runtime_state_->resource_pool()->set_max_quota(CpuInfo::num_cores() + 1);
> Nm, you are already doing that. It seems weird that the scanner threads quo
Yes, I agree that we probably should have the + 1 for the query option too.


-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2831: Bound the number of scanner threads per scan node.

2016-08-30 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4174

Change subject: IMPALA-2831: Bound the number of scanner threads per scan node.
..

IMPALA-2831: Bound the number of scanner threads per scan node.

Our current code base allows a scan node to spin up as many as
3x the number of logical cpu cores of scanner threads. However,
the scanner threads are cpu bound so there is diminishing return
for starting more scanner threads than the number of logical cores.
In fact, it may be detrimental due to context switching overhead.

This change bounds the number of scanner threads spun up by a scan
node to the number of logical cpu cores unless the query option
'num_scanner_threads' is set. The total number of available thread
tokens is unchanged. With this change, the peak memory usage of the
following query on a single node impala cluster running on a machine
with 8 logical cores reduces from 287MB to 101MB.

select count(*) from tpch100_parquet.lineitem where l_orderkey > 20

Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
---
M be/src/exec/hdfs-scan-node.cc
1 file changed, 9 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4174/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I191988ad18d6b4caf892fc967258823edcf9681f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts

2016-08-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4006: dangerous rm -rf statements in scripts
..


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4078/6/bin/run-all-tests.sh
File bin/run-all-tests.sh:

PS6, Line 37: "${CLUSTER_DIR}/admin
Misplaced quote (similar to the one in clean.sh) ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Yonghyun Hwang <yongh...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts

2016-08-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4006: dangerous rm -rf statements in scripts
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4078/3/bin/impala-config.sh
File bin/impala-config.sh:

PS3, Line 45: $IMPALA_HOME/toolchain
> I intentionally didn't touch assignments, because quoting is not strictly n
I believe it'd be great to update them all the way. Thanks again for the clean 
up.


-- 
To view, visit http://gerrit.cloudera.org:8080/4078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts

2016-08-23 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4006: dangerous rm -rf statements in scripts
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4078/1/bin/clean.sh
File bin/clean.sh:

PS1, Line 49: $IMPALA_ALL_LOGS_DIRS
> Yes and no. On one hand, this is supposed to contain multiple log, so havin
May be I misunderstood your comment but for all practical purposes, we don't 
usually use multiple different log directories.


-- 
To view, visit http://gerrit.cloudera.org:8080/4078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts

2016-08-23 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4006: dangerous rm -rf statements in scripts
..


Patch Set 3:

(5 comments)

It appears that you are already fixing some of the places which are considered 
"non-dangerous". Can you please just bite the bullet and fix up the rest 
pointed out below ?

http://gerrit.cloudera.org:8080/#/c/4078/3/bin/impala-config.sh
File bin/impala-config.sh:

PS3, Line 45: $IMPALA_HOME/toolchain
missing quote ?


http://gerrit.cloudera.org:8080/#/c/4078/3/bin/run-all-tests.sh
File bin/run-all-tests.sh:

PS3, Line 111:  ${IMPALA_HOME}/testdata/bin/split-hbase.sh
missing quote ?


PS3, Line 137: ${IMPALA_HOME}/bin/run-workload.py
missing quote ?


PS3, Line 173: ${IMPALA_HOME}/bin/start-impala-cluster.py
missing quote ?


PS3, Line 177: ${IMPALA_HOME}/bin/mvn-quiet.sh
missing quote ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] CDH-43354: Bump CDH components' versions.

2016-08-22 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: CDH-43354: Bump CDH components' versions.
..


Patch Set 1:

A core run passed: 
http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/4002/

-- 
To view, visit http://gerrit.cloudera.org:8080/4084
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I217f2467628d0213afee1b8a531e48d035237212
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] CDH-43354: Bump CDH components' versions.

2016-08-22 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4084

Change subject: CDH-43354: Bump CDH components' versions.
..

CDH-43354: Bump CDH components' versions.

Change-Id: I217f2467628d0213afee1b8a531e48d035237212
---
M bin/impala-config.sh
1 file changed, 6 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/4084/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4084
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I217f2467628d0213afee1b8a531e48d035237212
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-4006 impala-config.sh contains dangerous rm -rf statements

2016-08-22 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4006 impala-config.sh contains dangerous rm -rf 
statements
..


Patch Set 1:

(12 comments)

Thanks for cleaning things up. Not sure if there is a easier way to fix this 
other than adding quote everywhere ?

http://gerrit.cloudera.org:8080/#/c/4078/1/bin/clean.sh
File bin/clean.sh:

PS1, Line 49: $IMPALA_ALL_LOGS_DIRS
Won't this line suffer from the same problem ?


PS1, Line 53: pushd $IMPALA_HOME/be
Same problem ?


PS1, Line 59: pushd $IMPALA_HOME/shell
Same problem ?


PS1, Line 71: rm -f $IMPALA_HOME/llvm-ir/impala*.ll
: rm -f $IMPALA_HOME/be/generated-sources/impala-ir/*
Are these lines also unsafe ?


http://gerrit.cloudera.org:8080/#/c/4078/1/bin/run-all-tests.sh
File bin/run-all-tests.sh:

PS1, Line 101: LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}"
Is this variable still used ?


PS1, Line 137: ${IMPALA_HOME}
Same problem ?


PS1, Line 173: {IMPALA_EE_TEST_LOGS_DIR}
Same problem ?


PS1, Line 173: ${IMPALA_HOME}/
Same problem ?


PS1, Line 196: ${IMPALA_HOME}
Same.


http://gerrit.cloudera.org:8080/#/c/4078/1/buildall.sh
File buildall.sh:

Line 248: "$IMPALA_HOME/bin/clean.sh"
Feel free to ignore but it should be sufficient to just have the quote around 
${IMPALA_HOME}. Same for other places in the file.


PS1, Line 321: $IMPALA_LZO
Same problem ?


PS1, Line 323: $IMPALA_LZO
Same problem ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4078
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <z...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 7: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4065/7/be/src/util/uid-util.h
File be/src/util/uid-util.h:

PS7, Line 61: (1L << 32) - 1
IMHO, defining this as a constant like the previous patch seems to be better as 
it gives a semantic meaning to this value (e.g. instance_idx_mask).


PS7, Line 66: -1
nit: missing space.


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4004: Don't access nested types in test failpoints.py

2016-08-21 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4074

Change subject: IMPALA-4004: Don't access nested types in test_failpoints.py
..

IMPALA-4004: Don't access nested types in test_failpoints.py

As part of fixing IMPALA-3692, the query in test_failpoints.py was
updated to have a predicate on a string column in a parquet table.
The update to the query was based on the failing query in the bug
which happens to access nested columns. Apparently, this doesn't
quite work with the legacy join and agg.

This change fixes the test to also work with legacy join and agg.
With the debug actions added in the fix of IMPALA-3692, it's not
necessary to access nested types to reproduce the problem as long
as there is a predicate on a string column.

Change-Id: Idc5e67b9748a13fcd76ea5fe140e2e6b18e809b7
---
M tests/failure/test_failpoints.py
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4074/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4074
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idc5e67b9748a13fcd76ea5fe140e2e6b18e809b7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

PS6, Line 505: int
Will there be a chance of overflow here ? Should we use int64_t instead ?

Once intance_idx > (1 << 31) - 1, it will be negative so instance_idx > 
MAX_FRAGMENT_INSTANCE_IDX will always be false. We may end up wrapping and not 
catch the problem.


PS6, Line 509:   
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/util/uid-util-test.cc
File be/src/util/uid-util-test.cc:

Line 42:   }
It may be worth having some negative test cases in which the instance_id 
exceeds the 32-bit width ?


http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/util/uid-util.h
File be/src/util/uid-util.h:

PS6, Line 70: std::numeric_limits::max()
Do we support up to (1L << 31) -2 number of instances ? Is there a reason why 
we cannot use all 32-bit for instance id ?


PS6, Line 75: int
Does this have the same overflow problem here ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet 
scanner
..


Patch Set 3: Code-Review+2

Rebase. Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 5: Code-Review+2

Carry +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3991/4/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 442: "Debug Action: MEM_LIMIT_EXCEEDED");
> this is fine with me, but any reason we don't use mem_tracker() (i.e. exec-
Done


http://gerrit.cloudera.org:8080/#/c/3991/4/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

PS4, Line 576: if requested by scanner
> I think we should delete this part because it sounds like the function itse
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-19 Thread Michael Ho (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3991

to look at the new patch set (#5).

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..

IMPALA-3962: Clean up scratch tuple batch on scan failures

The parquet scanner doesn't clean up 'scratch_batch_' properly
which causes it to process a partially filled scratch_batch_
if any of the column reader fails. This change cleans up the
scratch batch if any of the parquet column readers fails.
The clean up involves freeing the mem_pool of scratch_batch_
and setting number of tuples in scratch_batch_ to 0.

This change also extends debug action to emulate the behavior
of exceeding the query's memory limit.

Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
---
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/failure/test_failpoints.py
10 files changed, 76 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet 
scanner
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4064/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS1, Line 304: StartNewRowBatch
> I think we should disambiguate so that it's clear which variant is being us
Done


Line 341: RETURN_IF_ERROR(CommitRows(row_batch, num_row_to_commit));
> Doesn't CommitRows() call HdfsScanner::StartNewRowBatch() anyway?
I believe it's calling HdfsParquetScanner::CommitRows() which doesn't call 
StartNewRowBatch().


-- 
To view, visit http://gerrit.cloudera.org:8080/4064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples buffer in parquet scanner

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-3662: Don't double allocate tuples buffer in parquet 
scanner
..

IMPALA-3662: Don't double allocate tuples buffer in parquet scanner

HdfsScanner::StartNewRowBatch() is called once per row batch
by the parquet scanner to allocate a new row batch and tuple
buffer. Similarly, a scratch batch is created for each row
batch in HdfsParquetScanner::AssembleRows() which also contains
the tuple buffer. In reality, only the tuple buffer in the
scratch batch is used. So, the tuple buffer allocated by
HdfsScanner::StartNewRowBatch() is unused memory for the
parquet scanner.

This change fixes the problem above by implementing
HdfsParquetScanner::StartNewRowBatch() which creates
a new row batch without allocating the tuple buffer.
With this patch, the memory consumption when
materializing very wide tuples is reduced by half.

Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scanner.h
3 files changed, 14 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4064

Change subject: IMPALA-3662: Don't double allocate tuples' buffer in parquet 
scanner
..

IMPALA-3662: Don't double allocate tuples' buffer in parquet scanner

HdfsScanner::StartNewRowBatch() is called once per row batch
by the parquet scanner to allocate a new row batch and tuple
buffer. Similarly, a scratch batch is created for each row
batch in HdfsParquetScanner::AssembleRows() which also contains
the tuple buffer. In reality, only the tuple buffer in the
scratch batch is used. So, the tuple buffer allocated by
HdfsScanner::StartNewRowBatch() is unused memory for the
parquet scanner.

This change fixes the problem above by implementing
HdfsParquetScanner::StartNewRowBatch() which creates
a new row batch without allocating the tuple buffer.
With this patch, the memory consumption when
materializing very wide tuples is reduced by half.

Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scanner.h
3 files changed, 14 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/4064/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4064
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I826061a2be10fd0528ca4dd1e97146e3cb983370
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#4).

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..

IMPALA-3962: Clean up scratch tuple batch on scan failures

The parquet scanner doesn't clean up 'scratch_batch_' properly
which causes it to process a partially filled scratch_batch_
if any of the column reader fails. This change cleans up the
scratch batch if any of the parquet column readers fails.
The clean up involves freeing the mem_pool of scratch_batch_
and setting number of tuples in scratch_batch_ to 0.

This change also extends debug action to emulate the behavior
of exceeding the query's memory limit.

Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
---
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/failure/test_failpoints.py
10 files changed, 77 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS3, Line 441: Status::MemLimitExceeded();
> shoudl this use mem_tracker()->SetMemLimitExceeded() since Tim is convertin
Done


http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS3, Line 350: column_readers_[0]->RowGroupAtEnd()
> couldn't we have hit column 0's end of rowgroup, but not column 1's, which 
Good point. Reverted to the check in the old code.


http://gerrit.cloudera.org:8080/#/c/3991/3/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS3, Line 381: current
> currently
Done


PS3, Line 383: nodes
> node's
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3090: always log memory limit errors

2016-08-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3090: always log memory limit errors
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5ec5572b0e26898da352b7e6b11eb01c6edb2e5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3991/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 343:   context_->ClearStreams();
> We can also ask Alex about it, I believe he wrote this code.
I believe it's always safe to do the two lines above even if advance_row_group_ 
== true.


Line 373:   if (context_->cancelled()) return Status::CANCELLED;
> In case of parse error in AssembleRows(), we may not have called CommitRows
Removed. If the scanner doesn't see the cancellation here, it's bound to see it 
at some point in CommitRows() if keeps producing row batches.


Line 512: scratch_batch_->num_tuples));
> Need the same error handling in this path.
Done


Line 513: *skip_row_group = true;
> Yes, missed that after rebase.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3090: always log memory limit errors

2016-08-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3090: always log memory limit errors
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4049/1/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

PS1, Line 350:  int64_t failed_allocation)
Thanks for adding the comment. It may be worth setting the default value of 
'failed_allocation' to 0 so some callsites of this function don't have to pass 
0.


-- 
To view, visit http://gerrit.cloudera.org:8080/4049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5ec5572b0e26898da352b7e6b11eb01c6edb2e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

2016-08-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
..


Patch Set 4:

(16 comments)

Still going through the changes and digesting them. Some comments for now.

http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

PS4, Line 249: SwitchToIoBuffers(_buffer)
Not your change but I always find the inconsistency between returning bool vs 
Status a bit confusing. For instance, this function return Status while 
AddRow() passes the Status as argument.


PS4, Line 325: total_build_rows
'total_build_rows'


http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS4, Line 57: methods
interface ?


PS4, Line 152: Partition {
I suppose this is the build-side counterpart of ProbePartition, right ? Can you 
please add a brief class comment about it ?


PS4, Line 163: in memory size
in-memory data structures


PS4, Line 163: size
byte size


PS4, Line 175: he
the


PS4, Line 180: is_closed()
Why cannot this be the same as ProbePartition::IsClosed() and use { return 
build_rows_ == NULL; } ?


PS4, Line 290: then
the


PS4, Line 291:  needed to destruct the Status
Is this still true ? I believe we have added noexcept to the Status class 
already.


Line 294:   /// This status should used directly only by ProcessBuildBatch().
be used


http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

PS4, Line 523: join_op_ == TJoinOp::RIGHT_ANTI_JOIN ||
nit: long line


http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS4, Line 449:  is_closed()
nit: IsClosed()


PS4, Line 459: transferred.
... and the partition is considered closed.


http://gerrit.cloudera.org:8080/#/c/3873/4/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

PS4, Line 230: got_buffer
'got_buffer'


Line 231:   /// false if the block could not be pinned and no error was 
encountered.
And it's unknown if error was encountered. In which case, an error status will 
be returned.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner

2016-08-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3629: Codegen TransferScratchTuples() in 
hdfs-parquet-scanner
..


Patch Set 9: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..

IMPALA-3962: Clean up scratch tuple batch on scan failures

The parquet scanner doesn't clean up scratch_batch_ properly
which causes it to process a partially filled scratch_batch_
if any of the column reader fails. This change cleans up the
scratch batch if any of the parquet column readers fails.
The clean up involves freeing the mem_pool of scratch_batch_
and setting number of tuples in scratch_batch_ to 0.

This change also extends debug action to emulate the behavior
of exceeding the query's memory limit.

Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
---
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/failure/test_failpoints.py
10 files changed, 77 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..

IMPALA-3962: Clean up scratch tuple batch on scan failures

The parquet scanner doesn't clean up scratch_batch_ properly
which causes it to process a partially filled scratch_batch_
if any of the column reader fails. This change cleans up the
scratch batch if any of the parquet column readers fails.
The clean up involves freeing the mem_pool of scratch_batch_
and setting number of tuples in scratch_batch_ to 0.

Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
---
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M common/thrift/PlanNodes.thrift
M tests/failure/test_failpoints.py
10 files changed, 75 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 1:

To answer the question of why the stress test hit this: the query was cancelled 
(I noticed that while debugging it last week but couldn't quite complete the 
whole picture). The cancellation was noticed eventually by 
HdfsParquetScanner::AssembleCollection() which eventually bubbles 
'continue_execution == false' up the stack to the loop in AssembleRows(), 
causing this bug. So, it may indeed be a good idea to add some debug actions 
there for debug builds.

-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3991/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 343:   context_->ClearStreams();
> why is this under !advance_row_group_?  If we did this here, then it seems 
Yes. I am not 100% sure it's side-effect free (e.g. are these calls idempotent 
?). Let me think more about this.


Line 373:   if (context_->cancelled()) return Status::CANCELLED;
> why is this needed? won't the next GetNextInternal() notice this (inside Co
In case of parse error in AssembleRows(), we may not have called CommitRows().  
However, it may be worth some more rethinking of how the code should be 
structured so we can avoid this redundant check here.


Line 513: *skip_row_group = true;
> doesn't this path have the same bug?
Yes, missed that after rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..


Patch Set 1:

We probably need to create a malformed table with bad data in some columns.

-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested 
collection.
..


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3201: buffer pool header only

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3201: buffer pool header only
..


Patch Set 1:

Please feel free to address them in the next patch though.

-- 
To view, visit http://gerrit.cloudera.org:8080/3992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id771dea2eb4c1aa13c30d59e8b184a7d1bca8d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3201: buffer pool header only

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3201: buffer pool header only
..


Patch Set 1:

Sorry to jump in late at this point but there are also some comments at 
https://gerrit.cloudera.org/#/c/2569

-- 
To view, visit http://gerrit.cloudera.org:8080/3992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id771dea2eb4c1aa13c30d59e8b184a7d1bca8d34
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3962: Clean up scratch tuple batch on scan failures

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3991

Change subject: IMPALA-3962: Clean up scratch tuple batch on scan failures
..

IMPALA-3962: Clean up scratch tuple batch on scan failures

The parquet scanner doesn't clean up scratch_batch_ properly
which causes it to process a partially filled scratch_batch_
if any of the column reader fails. This change cleans up the
scratch batch if any of the parquet column readers fails.
The clean up involves freeing the mem_pool of scratch_batch_
and setting number of tuples in scratch_batch_ to 0.

Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
---
M be/src/exec/hdfs-parquet-scanner.cc
1 file changed, 6 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3991/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If1e27a1517d09ccaabdae1492b7e1fbf661ae3e5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

2016-08-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested 
collection.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3940/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 574:   
dst_batch->tuple_data_pool()->AcquireData(scratch_batch_->mem_pool(), false);
> FreeAll() also works for both cases (I ran a private build to confirm), so 
FreeAll() sounds good.


-- 
To view, visit http://gerrit.cloudera.org:8080/3940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs

2016-08-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1619, IMPALA-3018: Address various small memory 
allocation related bugs
..


Patch Set 2:

Added a new test in free-pool-test.cc

-- 
To view, visit http://gerrit.cloudera.org:8080/3807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs

2016-08-12 Thread Michael Ho (Code Review)
Hello Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3807

to look at the new patch set (#2).

Change subject: IMPALA-1619, IMPALA-3018: Address various small memory 
allocation related bugs
..

IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs

This patch addresses a potential overflow in calculation FreePool::Rellocate()
and its handling of zero-length allocations. This patch also adds code to
gracefully handle malloc() failures when initializing/resizing hash tables.

Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d
---
M be/src/exec/hash-table.cc
M be/src/runtime/free-pool-test.cc
M be/src/runtime/free-pool.h
3 files changed, 16 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/3807/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner

2016-08-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3629: Codegen TransferScratchTuples() in 
hdfs-parquet-scanner
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS7, Line 162:   if (scan_node_->runtime_state()->codegen_enabled()) {
Is this check redundant as GetCodegenFn() should return NULL if codegen is 
disabled, right ?


PS7, Line 585: int tuple_byte_size, bool has_filters
Did you mean to use these to replace some constants ? They don't seem to be 
used.


http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

PS7, Line 558:   *write_complete_tuple_fn = codegen->FinalizeFunction(fn);
 :   return Status::OK();
Doesn't this trigger the DCHECK in some code which check the returned function 
is not NULL if status.ok() ? In theory, FinalizeFunction() can return NULL if 
there is any error.


http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

PS7, Line 63: 
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/3774/7/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS7, Line 699: DCHECK(*write_aligned_tuples_fn != NULL);
Please see comments in CodegenWriteAlignedTuples() about this DCHECK().


-- 
To view, visit http://gerrit.cloudera.org:8080/3774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys

2016-08-12 Thread Michael Ho (Code Review)
Michael Ho has submitted this change and it was merged.

Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration 
keys
..


IMPALA-3829: OpenSession() logs errors on valid configuration keys

Refactored OpenSession() to process the supplied configuration
map in one loop. Call SetQueryOption() on normal configuration
keys only.

Other changes:
- Compare config keys to "impala.doas.user" in case-insensitive
manner.
- New E2E test to check that setting query options still works
after the change.

Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025
Reviewed-on: http://gerrit.cloudera.org:8080/3808
Tested-by: Internal Jenkins
Reviewed-by: Michael Ho <k...@cloudera.com>
---
M be/src/service/impala-hs2-server.cc
M tests/hs2/test_hs2.py
2 files changed, 39 insertions(+), 33 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

2016-08-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested 
collection.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3940/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 574:   
dst_batch->tuple_data_pool()->AcquireData(scratch_batch_->mem_pool(), false);
Sorry for the late review. If we are materializing a collection with empty 
tuples, do we actually need to transfer the memory pool of the scratch batch or 
can we free it at this point ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cchann...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys

2016-08-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration 
keys
..


Patch Set 3: Code-Review+2

Carry Henry's +2 forward. Attila cannot yet +2 at this point.

-- 
To view, visit http://gerrit.cloudera.org:8080/3808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3946: fix MemPool integrity issues with empty chunks

2016-08-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3946: fix MemPool integrity issues with empty chunks
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3838/2/be/src/runtime/mem-pool.cc
File be/src/runtime/mem-pool.cc:

PS2, Line 265: if (current_chunk_empty) DCHECK_EQ(chunks_[i].allocated_bytes, 
0);
Please add a comment stating that !current_chunk_empty doesn't imply the 
current chunk is not empty. It just means the caller isn't sure it's empty. May 
be it's better to rename it to 'check_current_chunk_empty' if it's not too 
verbose.


-- 
To view, visit http://gerrit.cloudera.org:8080/3838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03ad12e5b2b63cbb55e5c52562416d73a4bd9346
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation

2016-08-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation
..


Patch Set 4: Code-Review+2

Carry +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/3793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3946: fix MemPool integrity issues with empty chunks

2016-08-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3946: fix MemPool integrity issues with empty chunks
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3838/1/be/src/runtime/mem-pool-test.cc
File be/src/runtime/mem-pool-test.cc:

PS1, Line 435: Empty chunk
May be I misunderstood it but I think you meant the chunk from src isn't 
transferred. Whether the chunk is transferred should have nothing to do whether 
the chunk is empty or not, right ?


http://gerrit.cloudera.org:8080/#/c/3838/1/be/src/runtime/mem-pool.cc
File be/src/runtime/mem-pool.cc:

PS1, Line 105: int first_free_idx = 0;
IMHO, It may be slightly clearer to make the assignment of this variable more 
explicit:

if (UNLIKELY(current_chunk_idx_ == -1)) {
first_free_idx = 0;
} else {
DCHECK_GE(current_chunk_idx, 0);
first_free_idx = current_chunk_idx_ + (chunks_[current_chunk_idx_]  > 0);
// Always start at current_chunk_idx + 1 as we know the current chunk 
cannot fit.
for (int idx = current_chunk_idx_ + 1; idx < chunks_.size(); ++idx) {
 
}
}

It would be great to also document that the reason there can be a list of free 
chunks after the current chunk is a result of calling Clear() in the past. That 
always confuses me when I read this function without reading other functions at 
the same time.


PS1, Line 215: false
Is there a reason why this cannot be !keep_current || src->GetFreeOffset() > 0 ?


PS1, Line 260:  if (current_chunk_empty)
Can we keep the stricter check we had before if we modify line 215 ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03ad12e5b2b63cbb55e5c52562416d73a4bd9346
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly

2016-08-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3791/3/be/src/util/string-parser.h
File be/src/util/string-parser.h:

PS3, Line 360:   int i = 3;
 :   if (i + 5 <= len && strncasecmp(s + i, "inity", 5) == 0) {
 : i += 5;
 :   }
Feel free to ignore as the compiler will have figured it out anyway. These 
lines may be simplified as:

int i =3;
if (len >= 8 && strncasecmp(s, "infinity", 8) == 0) i = 8;


-- 
To view, visit http://gerrit.cloudera.org:8080/3791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement

2016-08-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
..


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/constant-replacement.cc
File be/src/codegen/constant-replacement.cc:

PS1, Line 37:  Value
Do you think Constant* will be more appropriate ? Value* may be a bit generic.


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/constant-replacement.h
File be/src/codegen/constant-replacement.h:

Line 1: // Copyright 2016 Cloudera Inc.
This header needs updating.


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS1, Line 829: ///
nit: "//". It should be fine to declare this in the header file too. Not much 
harm in exposing it but up to you.


PS1, Line 898:  ///
nit: "//"


PS1, Line 900:   fn_pass_manager->add(new ConstantReplacementPass(this));
Did you measure how costly this extra optimization pass is ?


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS1, Line 376: llvm::Value* GetBoolConstant
nice.


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exec/hash-table-constant-replacer.cc
File be/src/exec/hash-table-constant-replacer.cc:

Line 1: // Copyright 2016 Cloudera Inc.
This header needs updating.


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

PS1, Line 271: context->impl()->GetReturnPrecision()
An observation when I went through the patch: if we can somehow generate 
FunctionContext as an LLVM constant, a lot of these call sites replacement 
won't be needed. That said, I am not advocating we should switch to that 
approach (at least not now). It's just an alternative.


http://gerrit.cloudera.org:8080/#/c/3843/1/be/src/exprs/expr-constant-replacement.cc
File be/src/exprs/expr-constant-replacement.cc:

Line 1: // Copyright 2016 Cloudera Inc.
This header needs updating.


PS1, Line 41: ArgType
It seems unnecessary to have a separate class for this. May be we can forgo the 
need of a separate class called ExprConstantReplacement and put everything in 
this file under the class ExprConstantReplacer ?


Line 82:   const vector& arg_types, LlvmCodeGen* 
codegen,
nit: wrong indentation.


Line 84:   ArgTypeConstantReplacer replacer(return_type, arg_types);
Referring to the previous comment about combining this with the LLVM constant 
pass:

On the first glance, the code seems to still have two different patterns for 
doing constant replacement. One is this function and the other one is the LLVM 
constant pass. However, upon closer look, they are actually different. The 
replacement values used in this function depend on 'return_type' which may 
differ from functions to functions. 

On the other hand, the LLVM constant pass you added is for CpuInfo() whose 
replacement values will be the same for all functions given the same input.

In some sense, this function is a local replacement pass while the LLVM  pass 
is closer to a global replacement pass. It may be helpful to document the 
difference here as having these two patterns at the same time may be a bit 
confusing to decide which one to use when. Not sure if it's possible to DCHECK 
that this type of constant replacer is not used in the LLVM constant 
replacement pass as it doesn't seem to make sense ? I guess same thing can be 
said about hash_table_constant_replacer.


-- 
To view, visit http://gerrit.cloudera.org:8080/3843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration 
keys
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3829: OpenSession() logs errors on valid configuration keys

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3829: OpenSession() logs errors on valid configuration 
keys
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3808/1/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

PS1, Line 81: open_session_req.configuration = {"impala.doas.user": ""}
Is it worth adding a case sensitive test for impala.doas.user ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa9b823abc39ba9809a35a6f0844fa3436f1e025
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3791/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS2, Line 2796:   TestValue("CAST('  inf  ' AS FLOAT)", TYPE_FLOAT,
  :   numeric_limits::infinity());
  :   TestValue("CAST('  inf  ' AS DOUBLE)", TYPE_DOUBLE,
  :   numeric_limits::infinity());
> This doesn't seem to match what you wrote in the comment. Isn't leading whi
Nevermind. Looks like the leading whitespace is removed before calling 
StringToFloatInternal().


-- 
To view, visit http://gerrit.cloudera.org:8080/3791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3629: Codegen TransferScratchTuples() in 
hdfs-parquet-scanner
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3774/3/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

PS3, Line 44:   if (tuple_size == 0) {
Given that if (tuple_size == 0) { ...} is handling an entire scratch patch in 
one shot, it seems that we won't be benefiting a lot from it by eliminating the 
if statement. The gain we get may be offset by the codegen time to 
optimize/compile this function at runtime.

I would suggest cross-compiling the while loop below only.


PS3, Line 61: while (scratch_tuple != scratch_tuple_end) {
Can we factor the while loop (or the loop body only) out as a separate function 
and cross-compile it only ?


http://gerrit.cloudera.org:8080/#/c/3774/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS3, Line 388:   if (!node->runtime_state()->GetCodegen().ok()) {
 : return Status("Failed to get codegen");
 :   }
RETURN_IF_ERROR(node->runtime_state()->GetCodegen());


PS3, Line 408: llvm::ConstantInt::get(
 :   llvm::Type::getInt32Ty(codegen->context()), 
tuple_byte_size)
codegen->GetIntConstant(sizeof(tuple_byte_size), tuple_byte_size) ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3807

Change subject: IMPALA-1619, IMPALA-3018: Address various small memory 
allocation related bugs
..

IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs

This patch addresses a potential overflow in calculation FreePool::Rellocate()
and its handling of zero-length allocations. This patch also adds code to
gracefully handle malloc() failures when initializing/resizing hash tables.

Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d
---
M be/src/exec/hash-table.cc
M be/src/runtime/free-pool.h
2 files changed, 11 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/3807/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6eb9a4472a65cf68edb0323b13d745277ead2e1d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation

2016-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3793/2//COMMIT_MSG
Commit Message:

Line 16: cross-compilation as they won't benefit from it.
> Also the Uuid() function.
Done


http://gerrit.cloudera.org:8080/#/c/3793/2/be/src/exprs/utility-functions.cc
File be/src/exprs/utility-functions.cc:

Line 39: StringVal UtilityFunctions::GenUuid(FunctionContext* ctx) {
> Maybe it's worth commenting why this isn't cross-compiled. I don't think it
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation

2016-07-28 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3793

to look at the new patch set (#3).

Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation
..

IMPALA-1112: Remove some unncessary code from cross-compilation

This change stops including some boost library header files
which pulls in other unnecessary boost library header files.
This reduces the amount of cross-compiled code which needs
to be materialized during codegen.

This change also removes some UDF's Prepare() and Close()
functions and UDF functions fromUtc(), toUtc() and uuid()
from cross-compilation as they won't benefit from it.

With this change, the bitcode module reduces from 2.12 MB to 1.86MB.

Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
---
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr.cc
M be/src/exprs/timestamp-functions-ir.cc
A be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/exprs/timezone_db.cc
A be/src/exprs/timezone_db.h
M be/src/exprs/utility-functions-ir.cc
A be/src/exprs/utility-functions.cc
M be/src/exprs/utility-functions.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-http-handler.cc
M be/src/statestore/statestore.cc
M be/src/util/debug-util.cc
18 files changed, 331 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation
..

IMPALA-1112: Remove some unncessary code from cross-compilation

This change stops including some boost library header files
which pulls in other unnecessary boost library header files.
This reduces the amount of cross-compiled code which needs
to be materialized during codegen.

This change also removes some UDF's Prepare() and Close()
functions and two UDF functions fromUtc() and toUtc() from
cross-compilation as they won't benefit from it.

With this change, the bitcode module reduces from 2.12 MB to 1.86MB.

Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
---
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr.cc
M be/src/exprs/timestamp-functions-ir.cc
A be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/exprs/timezone_db.cc
A be/src/exprs/timezone_db.h
M be/src/exprs/utility-functions-ir.cc
A be/src/exprs/utility-functions.cc
M be/src/exprs/utility-functions.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-http-handler.cc
M be/src/statestore/statestore.cc
M be/src/util/debug-util.cc
18 files changed, 327 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Abandoned

Review is moved over to https://gerrit.cloudera.org/#/c/3792/

-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-ASF-CR] IMPALA-1112: Remove some unncessary code from cross-compilation

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3793

Change subject: IMPALA-1112: Remove some unncessary code from cross-compilation
..

IMPALA-1112: Remove some unncessary code from cross-compilation

This change stops including some boost library header files
which pulls in other unnecessary boost library header files.
This reduces the amount of cross-compiled code which needs
to be materialized during codegen.

This change also removes some UDF's Prepare() and Close()
functions and two UDF functions fromUtc() and toUtc() from
cross-compilation as they won't benefit from it.

With this change, the bitcode module reduces from 2.12 MB to 1.86MB.

Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
---
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr.cc
M be/src/exprs/timestamp-functions-ir.cc
A be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/exprs/timezone_db.cc
A be/src/exprs/timezone_db.h
M be/src/exprs/utility-functions-ir.cc
A be/src/exprs/utility-functions.cc
M be/src/exprs/utility-functions.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/impala-http-handler.cc
M be/src/statestore/statestore.cc
M be/src/util/debug-util.cc
18 files changed, 318 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/3793/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I543809c69da0b4085a0e299b91cd550b274c46af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3740/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 553: referencd
> referenced
Done


PS4, Line 554: so LLVM
 :   /// cannot resolve references to them if they are not 
materialized.
> this now seems redundant with the last sentence. delete.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 1: Code-Review+2

Review is at https://gerrit.cloudera.org/#/c/3740/. Carry +2 over.

-- 
To view, visit http://gerrit.cloudera.org:8080/3792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3792

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..

IMPALA-3906: Materialize implicitly referenced IR functions

With lazy materialization of IR functions in the LLVM module,
there is an assumption that functions not referenced by IR
functions used in the query don't have to be materialized
and can have their linkage types changed to being externally
defined. These unmaterialized functions may be referenced by
global variables in the LLVM module and LLVM resolves these
references to their definitions in the native Impalad binary.
These global variables are mostly arrays containing references
to other global constants or boost and Impala functions included
in the cross-compiled code.

When compiling Impalad with optimization, gcc may actually
inline some of the functions which the global variables in
the LLVM modules reference and LLVM may have linking error
if these referenced IR functions are not materialized as it
can no longer find their definitions in the Impalad binary.
This patch fixes the problem by parsing all the global variables
in the LLVM module during Impalad's initialization and recording
all the referenced functions which aren't defined in the Impalad
binary and make sure they are always materialized.

A second problem which this patch fixes is that linking in
external LLVM module (e.g. UDF IR created by a user) may
implicitly materialize some functions in the external module.
Normally, we would expect functions to be materialized
through the GetFunction() interface and LinkModule() seems
to be an exception. This patch updates LinkModule() to also
parse the functions from the external module just like we do
for unmaterialized functions in GetFunction().

Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/util/symbols-util-test.cc
3 files changed, 152 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/3792/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I147eb332adc3f272b7a7a4fb4415842432c05f77
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-26 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3740

to look at the new patch set (#4).

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..

IMPALA-3906: Materialize implicitly referenced IR functions

With lazy materialization of IR functions in the LLVM module,
there is an assumption that functions not referenced by IR
functions used in the query don't have to be materialized
and can have their linkage types changed to being externally
defined. These unmaterialized functions may be referenced by
global variables in the LLVM module and LLVM resolves these
references to their definitions in the native Impalad binary.
These global variables are mostly arrays containing references
to other global constants or boost and Impala functions included
in the cross-compiled code.

When compiling Impalad with optimization, gcc may actually
inline some of the functions which the global variables in
the LLVM modules reference and LLVM may have linking error
if these referenced IR functions are not materialized as it
can no longer find their definitions in the Impalad binary.
This patch fixes the problem by parsing all the global variables
in the LLVM module during Impalad's initialization and recording
all the referenced functions which aren't defined in the Impalad
binary and make sure they are always materialized.

A second problem which this patch fixes is that linking in
external LLVM module (e.g. UDF IR created by a user) may
implicitly materialize some functions in the external module.
Normally, we would expect functions to be materialized
through the GetFunction() interface and LinkModule() seems
to be an exception. This patch updates LinkModule() to also
parse the functions from the external module just like we do
for unmaterialized functions in GetFunction().

Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/util/symbols-util-test.cc
3 files changed, 153 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS2, Line 144: Also ignore other global
 : // variables as InitializeLlvm() will go through all of them.
> Not sure what this is saying.  What does "other" mean here?
Rephrased the comments. It means other global variables in the same module. The 
reason they should be avoided is that we will parse all global variable 
one-by-one any way so there is no need to parse other global variables 
referenced by the current global variable.


PS2, Line 320: fn.isMaterializable()
> at this point, what functions in new_module are !isMaterializable()? when m
This is to avoid built-in functions and declarations. LLVM functions sometimes 
have bad names.


Line 337:   }
> can there be global variables in the module that need to be inspected for f
Good point. Code is updated to also parse the global variables in the source 
module.


http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS2, Line 546: .
> ... that aren't defined in the Impalad native code (they may have been inli
Done


PS2, Line 546: implicitly
> what's implicit? aren't they explicitly referenced by the global variables?
It's implicit wrt to GetFunction() but yes, it's explicitly referenced by the 
global variables.


PS2, Line 548: so LLVM cannot resolve references to them if they are not 
materialized.
 :   /// These functions are always materialized each time the 
module is loaded.
> These functions are always materialized each time the module is loaded to e
Done


PS2, Line 550: gv_ref_functions_
> since these are a subset of global var referenced functions that we need in
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3740/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

Line 436:   static bool IsDefinedInImpalad(const std::string fn);
> Missing & to pass by reference
Oops. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-25 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3740

to look at the new patch set (#3).

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..

IMPALA-3906: Materialize implicitly referenced IR functions

With lazy materialization of IR functions in the LLVM module,
there is an assumption that functions not referenced by IR
functions used in the query don't have to be materialized
and can have their linkage types changed to being externally
defined. These unmaterialized functions may be referenced by
global variables in the LLVM module and LLVM resolves these
references to their definitions in the native Impalad binary.
These global variables are mostly arrays containing references
to other global constants or boost and Impala functions included
in the cross-compiled code.

When compiling Impalad with optimization, gcc may actually
inline some of the functions which the global variables in
the LLVM modules reference and LLVM may have linking error
if these referenced IR functions are not materialized as it
can no longer find their definitions in the Impalad binary.
This patch fixes the problem by parsing all the global variables
in the LLVM module during Impalad's initialization and recording
all the referenced functions which aren't defined in the Impalad
binary and make sure they are always materialized.

A second problem which this patch fixes is that linking in
external LLVM module (e.g. UDF IR created by a user) may
implicitly materialize some functions in the external module.
Normally, we would expect functions to be materialized
through the GetFunction() interface and LinkModule() seems
to be an exception. This patch updates LinkModule() to also
parse the functions from the external module just like we do
for unmaterialized functions in GetFunction().

Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/util/symbols-util-test.cc
3 files changed, 127 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 106: bool LlvmCodeGen::IsDefinedInImpalad(const Function* fn) {
> Maybe the argument should just be the function name?
Done


Line 109:   Status status = LibCache::instance()->GetSoFunctionPtr("", fn_name, 
_ptr, NULL, true);
> Long line.
Done


PS1, Line 122: dyn_cast
> I think you want cast<> instead of dyn_cast<> here and below since you don'
Done


Line 138:   } else if (isa(val) || 
isa(val)) {
> Maybe DCHECK that this is true? I think we want to know if another type of 
I added a DCHECK in a new else clause after this one. Please see if it makes 
sense to you.


Line 147: Function* fn = fn_list[i];
> I don't think we need the index, so we can simplify to:
Done


PS1, Line 326: "picked"
> "merged"
Done


Line 328:   for (int i = 0; i < fn_names.size(); ++i) {
> for (const string& fn_name: fn_names) {
Done


Line 372:   codegen->MaterializeFunction();
> Do you have an idea of how many additional functions we're materialising? I
It's adding 12 functions. There is slightly more overhead based on rough 
estimate from the profile.They will be removed in a patch for IMPALA-1112.

_ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE23set_iso_extended_formatEv
_ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14set_iso_formatEv
_ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED0Ev
_ZN5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED2Ev
_ZN5boost6detail15sp_counted_baseD0Ev
_ZNK5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE9do_put_tmES7_RSt8ios_basecRK2tmSs
_ZNK5boost9date_time10date_facetINS_9gregorian4dateEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14do_put_specialES7_RSt8ios_basecNS0_14special_valuesE
_ZN5boost6detail15sp_counted_baseD2Ev
_ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE14set_iso_formatEv
_ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE23set_iso_extended_formatEv
_ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED0Ev
_ZN5boost9date_time10time_facetINS_10posix_time5ptimeEcSt19ostreambuf_iteratorIcSt11char_traitsIcEEED2Ev


http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS1, Line 550: set
> unordered_set? I believe std::set is a balanced binary tree that isn't very
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3740/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS1, Line 326: "picked"
"merged"


-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3906: Materialize implicitly referenced IR functions

2016-07-25 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3740

Change subject: IMPALA-3906: Materialize implicitly referenced IR functions
..

IMPALA-3906: Materialize implicitly referenced IR functions

With lazy materialization of IR functions in the LLVM module,
there is an assumption that functions not referenced by IR
functions used in the query don't have to be materialized
and can have their linkage types changed to being externally
defined. These unmaterialized functions may be referenced by
global variables in the LLVM module and LLVM resolves these
references to their definitions in the native Impalad binary.
These global variables are mostly arrays containing references
to other global constants or boost and Impala functions included
in the cross-compiled code.

When compiling Impalad with optimization, gcc may actually
inline some of the functions which the global variables in
the LLVM modules reference and LLVM may have linking error
if these referenced IR functions are not materialized as it
can no longer find their definitions in the Impalad binary.
This patch fixes the problem by parsing all the global variables
in the LLVM module during Impalad's initialization and recording
all the referenced functions which aren't defined in the Impalad
binary and make sure they are always materialized.

A second problem which this patch fixes is that linking in
external LLVM module (e.g. UDF IR created by a user) may
implicitly materialize some functions in the external module.
Normally, we would expect functions to be materialized
through the GetFunction() interface and LinkModule() seems
to be an exception. This patch updates LinkModule() to also
parse the functions from the external module just like we do
for unmaterialized functions in GetFunction().

Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/util/symbols-util-test.cc
3 files changed, 121 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/40/3740/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3653f55a1aa063b21cb3a5040f502b4c0ecf82e8
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>


Re: [VOTE] Apache Impala Bylaws

2016-07-23 Thread Michael Ho
+1 (binding)

On Fri, Jul 22, 2016 at 12:46 PM, Jim Apple  wrote:

> This is a vote on the following proposal for bylaws:
>
> https://gerrit.cloudera.org/#/c/3669/2
>
> The vote is to be done by "Lazy Consensus". Active PMC members,
> according to http://incubator.apache.org/projects/impala.html, may
> vote. The vote will be open 72 hours and will pass if there are "3
> binding +1 votes and more binding +1 votes than -1 votes."
>
> +
>
> I am not on the PPMC, so my vote is non-binding. Here it is anyway, as
> according to our draft bylaws, "Non binding votes are still useful for
> those with binding votes to understand the perception of an action in
> the wider Impala community."
>
> (Non-binding) +1.
>
> My reasoning is that these bylaws are probably not utterly bonkers,
> since they are mostly what Hadoop uses, and they are easy to change if
> anyone finds something problematic. Additionally, since many of us in
> the Impala community are new to The Apache Way, having a document that
> spells things out (like how voting works) will, I hope, serve as a
> helpful foundation.
>
> +++
>
> Here is a plain-text copy of the patch for mailing-list archival purposes:
>
> +++
>
> Apache Impala (incubating) Project Bylaws
>
> Introduction
>
> This document defines the bylaws under which the Apache Impala
> (incubating) project operates. It defines the roles and
> responsibilities of the project, who may vote, how voting works, how
> conflicts are resolved, etc.
>
> Impala is a project of the Apache Software Foundation. The foundation
> holds the trademark on the name "Impala" and copyright on Apache code
> including the code in the Impala codebase. The foundation FAQ explains
> the operation and background of the foundation.
>
> Impala is typical of Apache projects in that it operates under a set
> of principles, known collectively as the "Apache Way". If you are new
> to Apache development, please refer to the Incubator project for more
> information on how Apache projects operate.
>
> Roles and Responsibilities
>
> Apache projects define a set of roles with associated rights and
> responsibilities. These roles govern what tasks an individual may
> perform within the project. The roles are defined in the following
> sections
>
> Users
> The most important participants in the project are people who use our
> software.
>
> Users contribute to the Apache projects by providing feedback to
> developers in the form of bug reports and feature suggestions. As
> well, users participate in the Apache community by helping other users
> on mailing lists and user support forums.
>
> Contributors
> All of the volunteers who are contributing time, code, documentation,
> or resources to the Impala Project. A contributor that makes
> sustained, welcome contributions to the project may be invited to
> become a Committer, though the exact timing of such invitations
> depends on many factors.
>
> Committers
> The project's Committers are responsible for the project's technical
> management. Committers have write access to the project's version
> control repositories. Committers may cast binding votes on any
> technical discussion.
>
> Committer access is by invitation only and must be approved by
> consensus approval of the active PMC members. A Committer is
> considered emeritus by their own declaration or by not contributing in
> any form to the project for over six months. An emeritus committer may
> request reinstatement of commit access from the PMC. Such
> reinstatement is subject to consensus approval of active PMC members.
>
> Significant, pervasive features may be developed in a speculative
> branch of the repository. The PMC may grant commit rights on the
> branch to its consistent contributors for the duration of the
> initiative. Branch committers are responsible for shepherding their
> feature into an active release and do not cast binding votes or vetoes
> in the project.
>
> All Apache committers are required to have a signed Contributor
> License Agreement (CLA) on file with the Apache Software Foundation.
> There is a Committer FAQ which provides more details on the
> requirements for Committers
>
> A committer who makes a sustained contribution to the project may be
> invited to become a member of the PMC. The form of contribution is not
> limited to code. It can also include code review, helping out users on
> the mailing lists, documentation, testing, etc.
>
> Release Manager
> A Release Manager (RM) is a committer who volunteers to produce a
> Release Candidate according to HowToRelease. The RM shall publish a
> Release Plan on the dev@ list stating the branch from which they
> intend to make a Release Candidate, at least one week before they do
> so. The RM is responsible for building consensus around the content of
> the Release Candidate, in order to achieve a successful Product
> Release vote.
>
> Project Management Committee
> The Project Management Committee (PMC) is responsible 

[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()

2016-07-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
..


Patch Set 5: Code-Review+2

http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/3754/
Verified the new patch is fine with the private ASAN run above. Carry +2 
forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/3630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()

2016-07-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3630/4//COMMIT_MSG
Commit Message:

PS4, Line 23: ClearlStreams
> typo
Fixed.


http://gerrit.cloudera.org:8080/#/c/3630/4/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS4, Line 194: decompression_type_
> it looks like this is only set in ProcessSplit() if we found a tuple starti
Actually, that's a good point. The default value of decompression_type_ is 
THdfsCompression::NONE so in the case in which no tuple is found in the scan 
range, we may mistakenly account it towards the wrong compression type. Given 
that we will keep the stream object alive till the end now, we can go ahead and 
restore the old code here.


-- 
To view, visit http://gerrit.cloudera.org:8080/3630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()

2016-07-20 Thread Michael Ho (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3630

to look at the new patch set (#5).

Change subject: IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()
..

IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close()

A recent commit changed the ownership of the Stream object to
its owning ScannerContext. After that change, a Stream object
will be destroyed in ScannerContext::ReleaseCompletedResources()
if the parameter 'done' is true. That usually happens in the
Close() function of a scanner. However, for the text scanner,
the Stream object can be destroyed after handling compressed data
before Close() is called. In that case, the cached handle to the
Stream object is invalid when it's referenced in Close() to
access the compression codec of the Stream object.

This change fixes the above problem by not deleting the stream
objects in ScannerContext::ReleaseCompletedResources(). Instead
a new function ScannerContext::ClearStreams() is added for that
purpose and it's invoked in HdfsScanner::Close() to release all
the stream objects. This avoids other use-after-free problems
in the code.

Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
6 files changed, 29 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/30/3630/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia88f6285563ff669ae215af22ed2d45e5398adae
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>


  1   2   3   4   5   >