[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Reviewed-on: http://gerrit.cloudera.org:8080/3070
Reviewed-by: Michael Ho 
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 230 insertions(+), 134 deletions(-)

Approvals:
  Michael Ho: Looks good to me, but someone else must approve
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 16
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 15: Code-Review+2

The build failures were manual cancellations...

Carry Dan's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 15
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 15: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 15
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3070/14/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 331: // There is grouping, so we will do partitioned aggregation.
> Still here in the latest patch. Please remove it.
Agh, sorry.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3070

to look at the new patch set (#15).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 230 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/15
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 15
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 14: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3070/14/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 331: // There is grouping, so we will do partitioned aggregation.
Still here in the latest patch. Please remove it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3070/13/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 278: if (is_streaming_preagg_) codegen_status = 
CodegenProcessBatchStreaming();
: else codegen_status = CodegenProcessBatch();
> Huh ? These lines lookd weird. Why don't you just do codegen_status = is_st
Done


Line 331: // There is grouping, so we will do partitioned aggregation.
> not needed.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-17 Thread Tim Armstrong (Code Review)
Hello Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3070

to look at the new patch set (#14).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 231 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/14
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3070/13/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 278: if (is_streaming_preagg_) codegen_status = 
CodegenProcessBatchStreaming();
: else codegen_status = CodegenProcessBatch();
Huh ? These lines lookd weird. Why don't you just do codegen_status = 
is_streaming_preagg_ ? Codege... : Codegen...


Line 331: // There is grouping, so we will do partitioned aggregation.
not needed.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 11:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 46
I made a mistake here caught by running the full test suite: we need to call 
this from both callsites of ProcessBatch(). I don't think this change was 
especially important, so I reverted  it.


Line 48: }
> DCHECK(expr_vals_cache->AtEnd()) ?
Done


Line 200: }
> DCHECK(expr_vals_cache->AtEnd()) ?
Done


http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 278: LlvmCodeGen* codegen;
: RETURN_IF_ERROR(state->GetCodegen(&codegen));
> Not needed in this function anymore ?
Good point.


Line 282: codegen_enabled = codegen_status.ok();
> This line and line 285 can be probably be merged.
Done


Line 343: // There is grouping, so we will do partitioned aggregation.
> This line can be moved to line 339 instead as this should not be specific t
Done


Line 1412:   hash_partitions_.clear();
> May be we should reset hash_tbls_[] here too ?
Done. I think the consistency DCHECKS are sufficient to catch any bugs with 
hash_tbls_ being out of sync with hash_partitions_, but this seems cleaner.


Line 1830: ConstantInt::get(Type::getInt32Ty(codegen->context()), 
prefetch_mode));
> not this change, but this is probably a common enough pattern that it's wor
Probably a good idea.


http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 232:  
> nit: extra space. Same below.
Done


Line 400: hash_tbls_
> nit: 'hash_tbls_'
Done


Line 499: stores the results
> Can you please mention the null bool and hash values are also cached ?
Explaining that here seems redundant, given that those are always cached in the 
ExprValueCache as part of evaluation. The rewording might make it a bit clearer 
that it's using the ExprValuesCache.


Line 500: 'ht_ctx'.
> the expression values cache in 'ht_ctx'.
Done


Line 501:  
> nit: extra space.
Done


Line 581: HashTable* hash_tbl
> Can you please add a comment about this new parameter ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 13:

I'm going to start a GVM, but will wait for Michael's +1 before merging.

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3070

to look at the new patch set (#13).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 231 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/13
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3070

to look at the new patch set (#12).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 226 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/12
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 11:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 278: LlvmCodeGen* codegen;
: RETURN_IF_ERROR(state->GetCodegen(&codegen));
Not needed in this function anymore ?


Line 282: codegen_enabled = codegen_status.ok();
This line and line 285 can be probably be merged.


Line 343: // There is grouping, so we will do partitioned aggregation.
This line can be moved to line 339 instead as this should not be specific to 
the codegen case.


Line 1412:   hash_partitions_.clear();
May be we should reset hash_tbls_[] here too ?


http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 232:  
nit: extra space. Same below.


Line 400: hash_tbls_
nit: 'hash_tbls_'


Line 499: stores the results
Can you please mention the null bool and hash values are also cached ?


Line 500: 'ht_ctx'.
the expression values cache in 'ht_ctx'.


Line 501:  
nit: extra space.


Line 581: HashTable* hash_tbl
Can you please add a comment about this new parameter ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 11: Code-Review+2

(3 comments)

Please get Michael's review also.

http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 48: }
DCHECK(expr_vals_cache->AtEnd()) ?


Line 200: }
DCHECK(expr_vals_cache->AtEnd()) ?


http://gerrit.cloudera.org:8080/#/c/3070/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1830: ConstantInt::get(Type::getInt32Ty(codegen->context()), 
prefetch_mode));
not this change, but this is probably a common enough pattern that it's worth 
factoring into a CodeGen method.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#11).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 226 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/11
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 11:

Rebased onto Michael's latest patch

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 73: HashTable* hash_tbl = hash_tbls_[partition_idx];
> Maybe add the DCHECK here that verifies hash_tbls_ is in sync. You could ev
Done. Added GetHashTable.


Line 92:  if (expr_vals_cache->IsRowNull()) return Status::OK();
> Is it intentional to put this if-stmt after fetching the hash value in the 
Yes, same reason as in the other place - to speed up the non-null branch.


Line 182: !expr_vals_cache->IsRowNull()) {
: if (!TryAddToHashTable(ht_
> Can you please combine these two if-stmts ?
Done


Line 186: RETURN_IF_ERROR(process_batch_status_)
> So, if we succeed in adding to the hash table, process_batch_status_->ok() 
Done


Line 214: Partition* __restrict__ partition,
: HashTable* __restrict__ hash_tbl,
> Do we need to pass these two parameters ? Can they be local to this functio
We could alternatively pass in the partition index, which might save an 
argument, but it seems cleaner to me if the partitioning is handled by the 
caller.


http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 282:   Function* codegen_process_batch_streaming_fn;
:   codegen_status = 
CodegenProcessBatchStreaming(&codegen_process_batch_streaming_fn);
:   if (codegen_status.ok()) {
: codegen->AddFunctionToJit(codegen_process_batch_streaming_fn,
: reinterpret_cast(&process_batch_streaming_fn_));
: codegen_enabled = true;
> Why cannot these be placed in CodegenProcessBatchStreaming() similar to how
Done. Makes sense, I'd never looked at restructuring it.


Line 293: void **codegened_fn_ptr = grouping_expr_ctxs_.empty() ?
: reinterpret_cast(&process_batch_no_grouping_fn_) :
: reinterpret_cast(&process_batch_fn_);
: codegen->AddFunctionToJit(codegen_process_batch_fn, 
codegened_fn_ptr);
> Why cannot these be placed in CodegenProcessBatch() similar to how we handl
Done


http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 494: 'ht_ctx'->expr_value_buffer_array_'
> This probably needs to be updated after rebase.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 202 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 73: HashTable* hash_tbl = hash_tbls_[partition_idx];
Maybe add the DCHECK here that verifies hash_tbls_ is in sync. You could even 
define a GetHashTable(int partition_idx) that does the DCHECK and returns the 
cached entry, so that all places get it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 92:  if (expr_vals_cache->IsRowNull()) return Status::OK();
Is it intentional to put this if-stmt after fetching the hash value in the line 
above ?


Line 182: !expr_vals_cache->IsRowNull()) {
: if (!TryAddToHashTable(ht_
Can you please combine these two if-stmts ?


Line 186: RETURN_IF_ERROR(process_batch_status_)
So, if we succeed in adding to the hash table, process_batch_status_->ok() 
should be TRUE, right ? If so, can we have a DCHECK for it (may be at line 200 
below) ?!


Line 214: Partition* __restrict__ partition,
: HashTable* __restrict__ hash_tbl,
Do we need to pass these two parameters ? Can they be local to this function ?


http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 282:   Function* codegen_process_batch_streaming_fn;
:   codegen_status = 
CodegenProcessBatchStreaming(&codegen_process_batch_streaming_fn);
:   if (codegen_status.ok()) {
: codegen->AddFunctionToJit(codegen_process_batch_streaming_fn,
: reinterpret_cast(&process_batch_streaming_fn_));
: codegen_enabled = true;
Why cannot these be placed in CodegenProcessBatchStreaming() similar to how we 
handle codegen for PHJ ?


Line 293: void **codegened_fn_ptr = grouping_expr_ctxs_.empty() ?
: reinterpret_cast(&process_batch_no_grouping_fn_) :
: reinterpret_cast(&process_batch_fn_);
: codegen->AddFunctionToJit(codegen_process_batch_fn, 
codegened_fn_ptr);
Why cannot these be placed in CodegenProcessBatch() similar to how we handle 
codegen for PHJ ?


http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 494: 'ht_ctx'->expr_value_buffer_array_'
This probably needs to be updated after rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 8:

The new version claws back a significant chunk of the regression (mainly via 
the hash_tbls_ cache):
Report Generated on 2016-05-16
Run Description: "Base: 89ee8d85c6e4e1a8d5522c34df5a0ebfeb6491f2 vs 
Ref: 9691d2b53ef330e3962b453681e4916b4d92af95"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:  impalad version 2.6.0-cdh5-INTERNAL 
RELEASE ()
Baseline Impala Version: impalad version 2.6.0-cdh5-INTERNAL 
RELEASE ()


++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) 
| GeoMean(s) | Delta(GeoMean) |

++---+-++++
| TARGETED-PERF(_20) | parquet / none / none | 13.27   | -14.63%
| 7.59   | -9.16% |

++---+-++++


++---+---++-++---++-+---+
| Workload   | Query | File 
Format   | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base 
StdDev(%) | Num Clients | Iters |

++---+---++-++---++-+---+
| TARGETED-PERF(_20) | primitive_groupby_decimal_lowndv.test | 
parquet / none / none | 2.11   | 2.02|   +4.71%   |   1.36%   |   2.35% 
   | 1   | 10|
| TARGETED-PERF(_20) | primitive_groupby_bigint_lowndv   | 
parquet / none / none | 2.12   | 2.07|   +2.32%   |   1.77%   |   4.02% 
   | 1   | 10|
| TARGETED-PERF(_20) | primitive_groupby_decimal_highndv | 
parquet / none / none | 14.90  | 16.50   |   -9.65%   |   6.41%   |   0.70% 
   | 1   | 10|
| TARGETED-PERF(_20) | primitive_groupby_bigint_pk   | 
parquet / none / none | 36.96  | 43.58   |   -15.17%  |   7.43%   |   0.53% 
   | 1   | 10|
| TARGETED-PERF(_20) | primitive_groupby_bigint_highndv  | 
parquet / none / none | 10.27  | 13.58   | I -24.38%  |   1.80%   |   1.72% 
   | 1   | 10|

++---+---++-++---++-+---+

(I) Improvement: TARGETED-PERF(_20) 
primitive_groupby_bigint_highndv [parquet / none / none] (13.58s -> 10.27s 
[-24.38%])

+--++--+--++---+--+--+++-+---+
| Operator | % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |

+--++--+--++---+--+--+++-+---+
| 03:AGGREGATE | 31.17% | 3.14s| 3.61s| -13.02%|   
2.25%   | 3.31s| 3.73s| -11.43%| 1  | 0   | 2.95M |
| 02:EXCHANGE  | 2.39%  | 241.38ms | 254.61ms | -5.20% |   
2.05%   | 251.16ms | 265.85ms | -5.53% | 1  | 30.00M  | 29.49M|
| 01:AGGREGATE | 64.14% | 6.47s| 9.64s| -32.89%|   
2.58%   | 6.68s| 9.83s| -32.07%| 1  | 30.00M  | 29.49M|
| 00:SCAN HDFS | 2.30%  | 231.76ms | 229.50ms | +0.99% |   
3.87%   | 256.42ms | 247.23ms | +3.72% | 1  | 119.99M | 119.99M   |

+--++--+--++---+--+--+++-+---+

Significant perf change detected

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3070/8/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 27:   // 'prefetch_mode' and 'ht_ctx' are unused.
I'll delete this in the next patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 203 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3070/5/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 50: std::min(batch->num_rows()
> doesn't the FOREACH_ROW_LIMIT take care of this? besides, it doesn't really
Sure, but we need to keep track of the start of the next batch somehow. This 
seemed to express the intent more directly than relying on FOREACH_ROW_LIMIT 
and using an additional counter.

The final group gets handled ok even if it's not a multiple by the 
FOREACH_ROW_LIMIT macro.

I realise now that taking the min() with num_rows is redundant so I'll remove 
it..


Line 50: batch_size
> cache_size
Done


Line 64: Batch
> Since we're not doing the whole batch, maybe rename this to something like 
Done


Line 68: batch_size
> cache_size
Done


Line 79: // Hoist lookups out of non-null branch to speed up non-null case.
> why does this speed it up?
I didn't dig down deep enough to fully understand - it probably gives the 
compiler/cpu more freedom to schedule instructions.


Line 82: HashTable* hash_tbl = 
hash_partitions_[partition_idx]->hash_tbl.get();
We added a fair bit of extra pointer chasing here:

PartitionedAggregationNode -> hash_partitions_ -> internal vector<> array -> 
Partition -> HashTable -> HashTable::buckets_

This seems to hurt on low-ndv aggs. So to mitigate the regression I added a 
cache for hash_tbls_ that short-circuits that chain of pointers.


Line 182:   const int batch_size = std::min(num_rows, 
expr_vals_cache->capacity());
> same comments
Done


http://gerrit.cloudera.org:8080/#/c/3070/5/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 490: free space available
> capacity of (i.e. the cache always starts empty).
Done


Line 558:   /// that 'prefetch_mode' will be substituted with constants 
during codegen time.
> maybe move after needs_serialize to keep in order
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Co-author: Michael Ho 

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 203 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

TODO: once we finish the PHJ::ProcessProbeBatch() patch I
will do a final pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 203 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack.

TODO: once we settle on idioms in the PHJ::ProcessProbeBatch() patch I
will do another pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 155 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack.

TODO: once we settle on idioms in the PHJ::ProcessProbeBatch() patch I
will do another pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
5 files changed, 155 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 5:

I'm just trying out some ideas to try and fix the regression.

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 5:

(8 comments)

Please have Mostafa review the perf results (could do that here in the code 
review to keep the discussion in one place).

http://gerrit.cloudera.org:8080/#/c/3070/5/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 50: std::min(batch->num_rows()
doesn't the FOREACH_ROW_LIMIT take care of this? besides, it doesn't really 
make sense if cache capacity is not a multiple of batch num_rows (we need to 
limit the iteration on the last group anyway).


Line 50: batch_size
cache_size


Line 64: Batch
Since we're not doing the whole batch, maybe rename this to something like 
EvalAndHashPrefetchGroup()? I think that's what the join patch is callnig it.


Line 68: batch_size
cache_size


Line 79: // Hoist lookups out of non-null branch to speed up non-null case.
why does this speed it up?


Line 182:   const int batch_size = std::min(num_rows, 
expr_vals_cache->capacity());
same comments


http://gerrit.cloudera.org:8080/#/c/3070/5/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 490: free space available
capacity of (i.e. the cache always starts empty).


Line 558:   /// that 'prefetch_mode' will be substituted with constants 
during codegen time.
maybe move after needs_serialize to keep in order


-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 5:

I also observed that if I turn off prefetching, most of the regression on 
low-ndv aggs seems to go away.

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 5:

Rebased onto the latest phj patch.

I did some benchmarking on a slightly earlier version. Overall trend is that 
high-ndv aggs are much faster, and low-ndv aggs are slightly slower. On 
end-to-end tests this seems to give a small net win:

Run Description: "Base: 7ad3faa4e3fa5b55b84ae3b2888caa3e4bdf8238 vs 
Ref: ded7fb79caf0466a22ab64cad18998f73cb38f3d"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:  impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
Baseline Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()


++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |

++---+-++++
| TARGETED-PERF(_20) | parquet / none / none | 12.59   | -16.86%| 
7.48   | -7.39% |

++---+-++++


++---+---++-++---++-+---+
| Workload   | Query | File 
Format   | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base 
StdDev(%) | Num Clients | Iters |

++---+---++-++---++-+---+
| TARGETED-PERF(_20) | primitive_groupby_decimal_lowndv.test | parquet 
/ none / none | 2.27   | 1.96|   +15.61%  |   1.36%   |   1.95%
| 1   | 20|
| TARGETED-PERF(_20) | primitive_groupby_bigint_lowndv   | parquet 
/ none / none | 2.27   | 1.99|   +13.81%  |   2.17%   |   2.87%
| 1   | 20|
| TARGETED-PERF(_20) | primitive_groupby_bigint_pk   | parquet 
/ none / none | 35.71  | 42.77   |   -16.51%  |   6.66%   |   0.96%
| 1   | 20|
| TARGETED-PERF(_20) | primitive_groupby_bigint_highndv  | parquet 
/ none / none | 10.17  | 12.27   |   -17.11%  |   1.70%   |   0.65%
| 1   | 20|
| TARGETED-PERF(_20) | primitive_groupby_decimal_highndv | parquet 
/ none / none | 12.55  | 16.74   | I -25.04%  |   2.48%   |   1.57%
| 1   | 20|

++---+---++-++---++-+---+

(I) Improvement: TARGETED-PERF(_20) primitive_groupby_decimal_highndv 
[parquet / none / none] (16.74s -> 12.55s [-25.04%])

+--++--+--++---+--+--+++-+---+
| Operator | % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |

+--++--+--++---+--+--+++-+---+
| 03:AGGREGATE | 2.62%  | 330.04ms | 321.08ms | +2.79% |   
6.01%   | 379.54ms | 343.90ms | +10.36%| 1  | 0   | 174.13K   |
| 01:AGGREGATE | 94.94% | 11.96s   | 16.19s   | -26.14%|   
2.61%   | 12.66s   | 17.13s   | -26.10%| 1  | 1.78M   | 1.74M |
| 00:SCAN HDFS | 2.32%  | 291.90ms | 270.24ms | +8.01% |   
1.31%   | 299.88ms | 313.65ms | -4.39% | 1  | 119.99M | 119.99M   |

+--++--+--++---+--+--+++-+---+

Report Generated on 2016-05-16
Run Description: "Base: 7ad3faa4e3fa5b55b84ae3b2888caa3e4bdf8238 vs 
Ref: ded7fb79caf0466a22ab64cad18998f73cb38f3d"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:  impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
Baseline Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()


+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |

+---+---+-++++
| TPCH(_20) | parquet / none / none | 9.45| -1.54% | 6.38   
| -1.31% |

+--

[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack.

TODO: once we settle on idioms in the PHJ::ProcessProbeBatch() patch I
will do another pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
6 files changed, 157 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack.

TODO: once we settle on idioms in the PHJ::ProcessProbeBatch() patch I
will do another pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
6 files changed, 157 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..


Patch Set 4:

Some of the idioms I don't think match the direction the hash join patch was 
going, but I'll fix those when that patch is updated and I rebase on it.

-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-3286: prefetching for PartitionedAggregationNode

2016-05-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
..

IMPALA-3286: prefetching for PartitionedAggregationNode

This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack.

TODO: once we settle on idioms in the PHJ::ProcessProbeBatch() patch I
will do another pass and make sure that this is consistent.

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.h
6 files changed, 162 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/70/3070/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho