[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

2016-09-15 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/bin/load-tpc-kudu.py
File testdata/bin/load-tpc-kudu.py:

PS3, Line 51: tbls_to_clean = tpch_tables if workload.lower() == 'tpch' 
else tpcds_tables
> The change to enable drop db cascade for Kudu is in review. If you're stron
Done


http://gerrit.cloudera.org:8080/#/c/4327/3/testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
File testdata/workloads/tpcds/queries/tpcds-kudu-q19.test:

Line 39: 
> The TPC-DS workload is not currently enabled for Kudu in our regular (non-s
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 1.0-RC1 and add support for new OSes

2016-09-15 Thread Matthew Jacobs (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4420

to look at the new patch set (#2).

Change subject: Bump Kudu version to 1.0-RC1 and add support for new OSes
..

Bump Kudu version to 1.0-RC1 and add support for new OSes

Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4420/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

2016-09-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
..


Patch Set 14:

(19 comments)

Flushing out some more comments, and will move onto the later patch set.

One thing that's not clear to me yet is what the right separation between node 
and sink is -- they are currently very tightly coupled which means they don't 
really benefit from being separated into separate classes.  Is the goal of this 
change solely to be able to expose the DataSink interface, or is it to provide 
some separation as well? Do we see them becoming less coupled over time?

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS14, Line 45: build row partitions
is this the same as the "hash partitions" referred to in the next paragraph. 
let's be consistent in terminology.


Line 86: 
is it expected that the renaming public functions would only be used by PHJN?  
If so, would be good to state that.


PS14, Line 90: Acquire ownership
Doesn't the ownership pass from this class to the exec node?  In which case 
this method wouldn't acquire ownership.  So let's come up with better 
terminology and naming.


PS14, Line 95: Called after probing of the partitions
 :   /// is done. The partitions are not closed or destroyed, since 
they may be spilled or
 :   /// may contain unmatched build rows for certain join modes 
(e.g. right outer join).
The fact that we need some much explanation kinda makes this seem like it might 
not be the right abstraction.  Could we instead just clear the vector when 
creating the next set of partitions (it doesn't look like we delete partitions 
until either Close() or Reset() anyway), so why do we need to proactively 
clearly the hash_partitions_ vector before starting he next iteration?


Line 106:   }
nit: missing blank line


PS14, Line 109: When this call returns
Is this stuff by this method or the caller?  i.e. if this method closes 
input_partition, then say that.  If it doesn't then either say "after" or don't 
say it at all (why is it relevant to this method documentation?).


PS14, Line 112: so that the probe phase has enough buffers preallocated
  :   /// execute successfully.
slightly garbled. seems like a word is missing or something.


PS14, Line 117: /// Iterates over all the partitions in 'hash_partitions_' and 
returns the largest
  :   /// number of build rows.
rather than say how and referring to private member, and leaving it to the 
reader to assume that "largest number of build rows" is talking about a 
per-partition thing, let's say something like:

Returns the maximum number of build rows of a hash partition 

(or whatever terminology you choose in the class comment).


Line 123:   bool HashTableStoresNulls() const;
why is this here rather than the join node? especially given that the join node 
owns the HashTableCtx.  on a related note, I'm not really sure how the 
HashTable context is going to work out once this is driven like a real sync.  
Will we still have a parent_ link to the exec node living in a different 
fragment?


Line 128:   inline const std::vector& is_not_distinct_from() const {
and then should this be here? is it to be closer to being able to break the 
parent_ link at some point and make the builder able to build the hash table 
without any info from parent_?


PS14, Line 182: partition side of this partition
not sure what this is saying. maybe just "for this build partition"?


Line 188: /// false.
maybe clarify that an error is not returned in that case.  or clarify what 
cases are errors. etc.


PS14, Line 195: unpin_all_build
why "build"?  shouldn't it be unpin_all_blocks?  or even better would be to 
name it according to what the algorithm cares about. which I think is whether 
or not we might have more build rows to append, right?


PS14, Line 234: build or probe
does this care about the probe side partitioning?


Line 237:   /// 'null_aware_probe_partition_' and 'null_probe_rows_'.
Also explain what this is computing, rather than just what the computation is.


PS14, Line 244: all hash partitions for partitioning level
this is kinda misleading. maybe qualify with ... for the current input, or the 
"current set of hash partitions for partition level...". i.e. this isn't all 
the hash partitions that are created for a particular level.


PS14, Line 252: in
into


PS14, Line 256: in memory
what does this mean? is it trying to say it fits in the current (pinned) block?


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS14, Line 410:  we then iterate over all the probe rows
> Clarified that it's the partition's. Most of the text was just copied verba
Thanks. From the perspective of the reviewer, the comment is 

[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps

2016-09-15 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4111: backend death tests should not produce minidumps
..


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 1.0-RC1 and add support for new OSes

2016-09-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: Bump Kudu version to 1.0-RC1 and add support for new OSes
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4420/2/bin/impala-config.sh
File bin/impala-config.sh:

PS2, Line 136: centos5 | debian6 | enterprise*5 | redhat*5 | suse* )
 : KUDU_IS_SUPPORTED=false;;
Wouldn't it make more sense to list the supported OSes instead? I mean, do we 
really support debian5? :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-15 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 6: Code-Review+1

(3 comments)

Carry +1

http://gerrit.cloudera.org:8080/#/c/4371/5/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS5, Line 1020:   if (new_value < min_) { min_ = new_value; }
  :   if (new_value > max_) { max_ = new_value; }
  : }
> 1 line
Done


PS5, Line 1023: 
  : void RuntimeProfile::SummaryStatsCounter::SetStats(const 
TSummaryStatsCounter& counter) {
  :   l
> 1 line
Done


http://gerrit.cloudera.org:8080/#/c/4371/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 338: # and average values as the same since we have only one sample 
(i.e. only one range)
> can you add a TODO for handling the case when there is more than one range 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Make gen build version.py resilient to a failing git rev-parse

2016-09-15 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Make gen_build_version.py resilient to a failing git rev-parse
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4411/4/bin/gen_build_version.py
File bin/gen_build_version.py:

PS4, Line 36: with open(os.devnull, 'w') as devnull:
Does this Python script need to support ancient Python 2.4? If yes, with won't 
be supported and this will fail.


http://gerrit.cloudera.org:8080/#/c/4411/4/bin/save-version.sh
File bin/save-version.sh:

PS4, Line 25: GIT_HASH=$(git rev-parse HEAD) 2> /dev/null
"2> /dev/null" should go inside the parens with the command.


PS4, Line 28:   GIT_HASH="Could not obtain git hash"
Has this path been tested? Does Impala work when the derived git hash doesn't 
match the typical pattern of a git hash? It might be safer to echo a warning 
here and make GIT_HASH empty. We know an empty GIT_HASH already won't cause 
Impala to crash or exit. If the point here is "best effort", this seems safer.


-- 
To view, visit http://gerrit.cloudera.org:8080/4411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 1.0-RC1 and add support for new OSes

2016-09-15 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 1.0-RC1 and add support for new OSes
..


Patch Set 3: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/205/

-- 
To view, visit http://gerrit.cloudera.org:8080/4420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

2016-09-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3739: Enable stress tests on Kudu
..


Patch Set 4:

(3 comments)

Thanks for calling these out mikeb.

http://gerrit.cloudera.org:8080/#/c/4327/4/testdata/datasets/tpcds/tpcds_kudu_template.sql
File testdata/datasets/tpcds/tpcds_kudu_template.sql:

PS4, Line 263: 'kudu.key_columns'='sr_returned_date_sk,sr_ticket_number, 
sr_item_sk',
> In TPC-DS, the primary keys are sr_item_sk, sr_ticket_number . Is this diff
Done


PS4, Line 575: 'kudu.key_columns'='hd_demo_sk,hd_income_band_sk',
> In TPC-DS only hd_demo_sk is a primary key.
Done


PS4, Line 643: 'kudu.key_columns'='p_promo_sk, p_item_sk',
> In TPC-DS, only p_promo_sk is a primary key.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3739: Enable stress tests on Kudu

2016-09-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-3739: Enable stress tests on Kudu
..

IMPALA-3739: Enable stress tests on Kudu

This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
---
A testdata/bin/load-tpc-kudu.py
A testdata/datasets/tpcds/tpcds_kudu_template.sql
A testdata/datasets/tpch/tpch_kudu_template.sql
A testdata/workloads/tpcds/queries/tpcds-kudu-q19.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q27.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q3.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q34.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q42.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q43.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q46.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q47.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q52.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q53.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q55.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q59.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q6.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q61.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q63.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q65.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q68.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q7.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q73.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q79.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q8.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q88.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q89.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q96.test
A testdata/workloads/tpcds/queries/tpcds-kudu-q98.test
M tests/comparison/db_connection.py
M tests/stress/concurrent_select.py
30 files changed, 2,470 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4327/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] Make gen build version.py resilient to a failing git rev-parse

2016-09-15 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Make gen_build_version.py resilient to a failing git rev-parse
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Make gen build version.py resilient to a failing git rev-parse

2016-09-15 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: Make gen_build_version.py resilient to a failing git rev-parse
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4428/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

Line 319:   capacity_ = tuple_ptrs_size_ / (num_tuples_per_row_ * 
sizeof(Tuple*));
> We already know the initial capacity using this calculation - we should com
What about just setting capacity_ = initial_capacity_ here?


-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#2).

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..

IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

If MarkCapacity() is called on a row batch, it is difficult to call
AcquireState() on that batch because tuple_ptrs_size_ is not accessible
to initialise the destination batch - this is usually calculated from
capacity(), but that value is wrong for these purposes after
MarkCapacity().

Add RowBatch::InitialCapacity() to return the initial capacity value of
the batch.

Add row-batch-test to add initial coverage of AcquireState() API.

Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
---
M be/src/runtime/CMakeLists.txt
A be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 77 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4428/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

2016-09-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
..


Patch Set 17:

(10 comments)

These are the comments from re-reviewing the addressed comments.  Will finish 
off the rest of builder.h and then the .cc files next.

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS14, Line 90: 
> It passes from the class to the caller. I guess I don't find this ambiguous
I guess one term used elsewhere is "release", right? (e.g. C++ unique_ptr).  I 
think it's better than "acquire" given the direction of resource transfer, so 
that's one option.  The reworded comment helps too.


PS14, Line 95:  streams are empty but prepared for
 :   /// writing with a write buffer allocated.
 :   std::vector 
AcquireProbeStreams();
> The hash join node uses this to determine whether it's already cleaned up t
Okay for now as an intermediate step, but let's definitely clean it up as work 
continues.


PS14, Line 195:  is destroyed i
> Overall these bool arguments are really confusing, and the correctness of t
Yeah, agree this bool was very confusing. Will take a look at the new stuff.


PS14, Line 234: 
> Yes, since we share the reservation and hand off the probe buffers.
Thanks, the new comment clarifies.  From reading it I couldn't tell if there 
was a "* 2" missing because of the comment.


http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 372:   { RETURN_IF_ERROR(build_partition->BuildHashTable()); }
extraneous braces


Line 395: }
same


http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS15, Line 81: build
probe


Line 103: ///
Thanks, I think this will make the code much more approachable.


PS15, Line 405:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/3873/15/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 391:   }
alternatively: initialize insert_pos to begin and then DCHECK_NE(insert_pos, 
children_.begin());  Okay to ignore.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4116: Remove 'cdh' from version string

2016-09-15 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4116: Remove 'cdh' from version string
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7754538a23e73dcdebc6e3df509f357cbe03198c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4116: Remove 'cdh' from version string

2016-09-15 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4116: Remove 'cdh' from version string
..


IMPALA-4116: Remove 'cdh' from version string

Change-Id: I7754538a23e73dcdebc6e3df509f357cbe03198c
Reviewed-on: http://gerrit.cloudera.org:8080/4421
Reviewed-by: Jim Apple 
Tested-by: Internal Jenkins
---
M bin/save-version.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7754538a23e73dcdebc6e3df509f357cbe03198c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4428/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS2, Line 282: MarkCapacity
> MarkAtCapacity I think
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4428

to look at the new patch set (#4).

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..

IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

If MarkCapacity() is called on a row batch, it is difficult to call
AcquireState() on that batch because tuple_ptrs_size_ is not accessible
to initialise the destination batch - this is usually calculated from
capacity(), but that value is wrong for these purposes after
MarkCapacity().

Add RowBatch::InitialCapacity() to return the initial capacity value of
the batch.

Add row-batch-test to add initial coverage of AcquireState() API.

Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
---
M be/src/runtime/CMakeLists.txt
A be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 77 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4428/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

2016-09-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
..


Patch Set 14:

> (2 comments)
 > 
 > Before I responded to individual comments I wanted to point out
 > that the coupling is now one way - the builder never refers to
 > PhjNode aside from metadata provided via the constructor (you can
 > verify there are no references to partitioned-hash-join-node.h or
 > PartitionedHashJoin node in the builder code).

Yup, that's good.  I thought there was a back reference on my first pass 
through, but I was thinking of the reference from partition back to the 
builder/sink.  Since it is the sink, it seems a little weird (once the builder 
is used by the join node in the other fragment), but okay for this first step.  
Splitting out the sink and builder is something we should revisit later.

 > 
 > At this point it should only require minor changes to this hash
 > join code to create the builder in a separate fragment *before* the
 > join node, provided you have a 1:1 mapping from builder to node.
 > 1:n broadcast joins with no spilling would require some more
 > changes, but is a lot closer.
 > 

Yup, agree.

 > I chose this stopping point because there was a clear physical
 > separation between the classes, even if the spilling algorithm has
 > some logical coupling and assumes a 1:1 mapping from node to
 > builder.
 > 
 > Reducing the logical coupling would require significant changes to
 > the spilling algorithm that is driven from PartitionedHashJoin::GetNext(),
 > which I think should be deferred until there is some consensus
 > about what spilling for 1:n broadcast joins with multithreading
 > might look like.

I'm okay with this point more or less as an incremental first step, as well. I 
just wanted to understand better where we are going longer term and what the 
immediate requirements are.

As mentioned, looking at whether the sink should be just a sink, and then the 
"builder" is something that's driven by both the sink and also the join node.  
But I'm fine with waiting on that since this is a good step in teasing things 
apart.

-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4111: backend death tests should not produce minidumps
..


Patch Set 6: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4136: testKudu planner test hangs if Kudu is not supported

2016-09-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4136: testKudu planner test hangs if Kudu is not 
supported
..


Patch Set 1: Code-Review+2

Thanks, Matt!

-- 
To view, visit http://gerrit.cloudera.org:8080/4426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf713107cdf49fa73efae5a731f26bbce96d8364
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-09-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..

IMPALA-3725 Support Kudu UPSERT in Impala

This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT except that if there's a
primary key collision instead of returning an error an update
is performed.

This patch adds an analysis class UpsertStmt, which represents
UPSERT statements, and InsertStmtBase, which is a superclass of
UpsertStmt and InsertStmt and contains logic common to both.

New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
  select_stmt
 | VALUES (value [, value...]) [, (value [, (value...)]) ...]
}

where column list must contain all of the key columns in table_name,
if specified, and table_name must be a Kudu table.

Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
A fe/src/main/java/com/cloudera/impala/analysis/InsertStmtBase.java
M fe/src/main/java/com/cloudera/impala/analysis/StmtRewriter.java
A fe/src/main/java/com/cloudera/impala/analysis/UpsertStmt.java
M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
M fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M fe/src/main/java/com/cloudera/impala/planner/TableSink.java
M fe/src/main/java/com/cloudera/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
21 files changed, 990 insertions(+), 479 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/4047/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4428/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS2, Line 282: MarkCapacity
MarkAtCapacity I think


-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4053: Address follow up comments for IMPALA-3610

2016-09-15 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4053: Address follow up comments for IMPALA-3610
..


Patch Set 3:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4306/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 374
> what's the rationale for leaving this here?
The query_mem_tracker_ is accessed by the QueryExecState even after TearDown() 
is called to release it in QueryExecState::ClearResultsCache()


Line 264: is_local = tFilterTarget.is_local_target;
> more interesting: how is the discarded or disabled state recorded in this s
There is no specific state to denote that a filter has been discarded. It 
happens as a consequence of other things, i.e. query completion, agg completion 
or being disabled.
I've updated how disabled state is recorded in this struct.


Line 277: /// it is disabled.
> FilterTarget isn't referenced in the .h file, also move it here
Done


Line 290:   vector* targets() { return _; }
> don't comment on the intention of the caller. describe the externally visib
Done


Line 295:   const TRuntimeFilterDesc& desc() const { return desc_; }
> describe externally visible behavior and role of parameters. just looking a
In the new patch the only externally available behavior is the mem tracker 
being updated and the rpc_params being set.


Line 304:   /// Releases tracked memory and frees 'bloom_filter'. A discarded 
filter consumes no
> This does not merely look like an index. Can you comment on why we're using
As opposed to? These are indices to fragment_instance_states_.


Line 733:   partition_filter.push_back(target.is_bound_by_partition_columns 
? "true" : "false");
> this line would need changing if we move pending_count into disabled()
See comment below.


PS2, Line 2069: < "Filter received before routing table complete";
> These two mostly occur together. Maybe move the pending_count check into di
They both mean different things, so moving the pending_count() check into 
disabled() would hamper readability (even though they currently could 
functionally be merged).


Line 2075:   unordered_set target_fragment_instance_idxs;
> remove blank line
Done


Line 2077: lock_guard l(filter_lock_);
> and this. this section of code deals with applying updates, so it's a good 
Done


PS2, Line 2078: t = filter_routing_tab
> Is it correct to check for !state->disabled() here? Can it happen that betw
Yes, it can get disabled in ApplyUpdate()


Line 2082: }
> const FilterTarget&
Done


Line 2090: // filters that can't affect the aggregated global filter.
> i find the control flow harder to follow than before, and the code has more
We didn't need to TryConsume() memory for a broadcast filter update as the 
input filter can just be passed on as the output filter. That's the reason for 
the extra special casing. However, even if I do change it back to how it was 
originally, we need to add special cases elsewhere to take care of the 
following case:

We don't want to disable the filter if the first update for a broadcast filter 
hits OOM in ApplyUpdate(), as subsequent updates may arrive at times of lower 
memory pressure.

This requires changing how we track pending_counts for broadcast filters, 
adding a couple of special cases in ApplyUpdate() and we don't get the benefit 
of avoiding using extra memory.

I've made the change describing the paragraph above, so you can see how it 
looks and it'll be easier to decide on what to do about this special casing.


PS2, Line 2092:   * if the filter could not be allocated
> How about "Filter has been processed and distributed completely, and can be
The distribution happens after this, so I've left that part out. Changed the 
rest of it.


PS2, Line 2147: // If it's a broadcast filter, future updates may arrive at 
a time of lower memory
  :   // pressure, so do not disable.
  :
> Could this go into Discard()?
It could but it would make the readability a little harder. This is in a way a 
matching call to L2111.


-- 
To view, visit http://gerrit.cloudera.org:8080/4306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7499558cc9d3622a9f3b68904aa5fe92a113f579
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: Don't bake ExprContext pointers into PAGG/AGG IR code

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4008: Don't bake ExprContext pointers into PAGG/AGG IR 
code
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 84: // Use NULL if the aggregate evaluator is not codegen'ed or if 
there is no
It might be simpler just to always set to NULL if input_expr_ctxs is empty. We 
lose some cross-validation but I feel like the simple code would be nice.


http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 82:   std::vector agg_expr_ctxs_;
Mention that the ExprContexts are owned by aggregate_evaluators_.


http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 158: ExprContext* agg_expr_ctx = NULL;
Again, I think it would be simpler to just check if it's empty. Or add a method 
to AggFnEvaluator like HasInputExprContexts()


http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 202:   /// ExprContexts of 'aggregate_evaluators_'. Used in the codegen'ed 
version of
Also document ownership here.


Line 205:   std::vector agg_expr_ctxs_;
What do you think about making this vector>? 

Each pair is just a struct with two pointers so should be simple to access from 
the IR.

This would better document the correspondence between the two vectors and 
result in one fewer argument being threaded through everywhere.

I'm mostly ok with the current approach but the extra argument is a bit 
annoying.


http://gerrit.cloudera.org:8080/#/c/4390/2/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 192:   std::vector input_expr_ctxs_;
Can you add your comment here about when this is empty/non-empty?


-- 
To view, visit http://gerrit.cloudera.org:8080/4390
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I42039eed803a39fa716b9ed647510b6440974ae5
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4428

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..

IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

If MarkCapacity() is called on a row batch, it is difficult to call
AcquireState() on that batch because tuple_ptrs_size_ is not accessible
to initialise the destination batch - this is usually calculated from
capacity(), but that value is wrong for these purposes after
MarkCapacity().

Add RowBatch::initial_capacity() to track the initial capacity value of
the batch.

Add row-batch-test to add initial coverage of AcquireState() API.

Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
---
M be/src/runtime/CMakeLists.txt
A be/src/runtime/row-batch-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 78 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4428/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] Bump Kudu version to 1.0-RC1 and add support for new OSes

2016-09-15 Thread Matthew Jacobs (Code Review)
Hello Internal Jenkins, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4420

to look at the new patch set (#4).

Change subject: Bump Kudu version to 1.0-RC1 and add support for new OSes
..

Bump Kudu version to 1.0-RC1 and add support for new OSes

Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
---
M bin/impala-config.sh
M testdata/cluster/node_templates/cdh5/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/cdh5/etc/kudu/tserver.conf.tmpl
3 files changed, 9 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4420/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-1670: Support multiple partitions in ALTER TABLE ADD PARTITION

2016-09-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670: Support multiple partitions in ALTER TABLE ADD 
PARTITION
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4144/8/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS8, Line 1766: for (Partition partition: hmsPartitions) {
  : // Create and add the HdfsPartition. Return the table 
object with an updated
  : // catalog version.
  : addHdfsPartition(tbl, partition);
  :   }
> Correct, this is exactly what happens. I've added an E2E test (test_add_ove
No, I don't think is the correct behavior. Actually I filed a JIRA 
(IMPALA-4141) about this case because it wasn't even working correctly for the 
single partition case. The right thing to do is to also add the partition in 
the catalog cache. If the user tries to add a partition he should get one of 
the two possible outcomes. Either an error is thrown and the partition is not 
added or no error and the partition is added.  No error and no partition added 
is weird from a usability point of view.


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Make gen build version.py resilient to a failing git rev-parse

2016-09-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Make gen_build_version.py resilient to a failing git rev-parse
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Make gen build version.py resilient to a failing git rev-parse

2016-09-15 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: Make gen_build_version.py resilient to a failing git rev-parse
..


Patch Set 6: Code-Review+2

Thanks Alex.

Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7af33502bbb70185dc15ffca6219436a616f25b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4074: Configuration items duplicate in template of YARN

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4074: Configuration items duplicate in template of YARN
..


Patch Set 2: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81d6d019d4982cb35932b1d45c376b215ec5bcc6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yuanhao Luo 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Yuanhao Luo 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4074: Configuration items duplicate in template of YARN

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4074: Configuration items duplicate in template of YARN
..


Patch Set 1:

I'm going to run tests and merge this.

-- 
To view, visit http://gerrit.cloudera.org:8080/4311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81d6d019d4982cb35932b1d45c376b215ec5bcc6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yuanhao Luo 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Yuanhao Luo 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 1.0-RC1 and add support for new OSes

2016-09-15 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to 1.0-RC1 and add support for new OSes
..


Patch Set 4: Code-Review+2

Carrying the +2-- was missing changes to the config files for running kudu. 

They started requiring a flag to enable debug/unsafe options which we use.

-- 
To view, visit http://gerrit.cloudera.org:8080/4420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbe554d6782212f91db07757f429c5571a7a44da
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4026: Implement double-buffering for BlockingQueue.

2016-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4026: Implement double-buffering for BlockingQueue.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4350/3/be/src/util/blocking-queue.h
File be/src/util/blocking-queue.h:

Line 99:   bool BlockingPut(const T& val) {
> Yes, I have thought about it before. I have tested various locations for no
We discussed this a little offline and I feel like I understand why option (3) 
works in practice now - when N producers are outpacing consumers, this switches 
to a mode where the steady state is a nearly empty queue with k producers 
sitting blocked and N - k producers working. Essentially there are some 
additional elements in the queue because the blocked producers also have some 
row batches ready to add to the queue immediately.

I'm ok moving forward with it so long as we document the expected behaviour. 
Perhaps it's adequate to document the expect behaviour with a faster consumer, 
matched producer/consumer and fast producers.

Have you benchmarked this with concurrent queries or workloads with many 
concurrent scans? One concern with the empty queue is that if the queue is 
empty and a producer isn't scheduled immediately, the consumer may end up 
waiting on the condition variable for every batch and potentially getting 
swapped it. We could perhaps work around that if the producer added its item to 
the queue *before* blocking, so that the the consumer can get the item without 
requiring a context switch.


-- 
To view, visit http://gerrit.cloudera.org:8080/4350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9f4cf351455efefb0f3bb791cf9bc82d1421d54
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Chen Huang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4138: Fix AcquireState() for batches that have MarkCapacity() called

2016-09-15 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4138: Fix AcquireState() for batches that have 
MarkCapacity() called
..


Patch Set 1:

(1 comment)

Agree that this maybe isn't a fix so much as an interface improvement, but I 
think it helps RowBatch be self-contained if I can AcquireState() from a batch 
using only that batch's interface.

http://gerrit.cloudera.org:8080/#/c/4428/1/be/src/runtime/row-batch-test.cc
File be/src/runtime/row-batch-test.cc:

Line 60: ASSERT_DEBUG_DEATH(bad_dest.AcquireState(), "");
Will change to IMPALA_ASSERT_DEBUG_DEATH when related patch lands.


-- 
To view, visit http://gerrit.cloudera.org:8080/4428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ceca53c406b05cd04b7d95a4f9f2eec7bc127f5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes