[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Alex Behm (Code Review)
Hello Dan Hecht, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6527

to look at the new patch set (#9).

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..

IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 709 insertions(+), 824 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

PS8, Line 65: ProcessSplit() will issue the files' scan ranges
:   // and those ranges will need scanner threads, so no files are 
marked completed yet.
> hmm, is that stale now? i guess technically not since this now happens in G
Good catch. I think this is misleading. Changed to GetNextInternal()


http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS8, Line 133: ProcessSplit
> what's the deal with making this non-pure? oh, I guess (most) scanners now 
Happy to address this, but let's discuss approaches first. 

Options:
* add a new virtual function that is a no-op for all scanners except parquet 
where we do a runtime filter check
* move the runtime filter stats and related functions like 
HdfsParquetScanner::CheckFiltersEffectiveness() into HdfsScanner and just do 
the runtime filter check for all scanners even though they are useless for 
non-parquet
* other ideas?


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 109:  *
> Taras, I think Dan is looking for a comment along the lines of what we have
Yup, that's the comment I was looking for. I'm fine with its current location 
without duplicating it, but maybe cross-reference it from where 
optimize_parquet_count_star_ is defined in the header, to make it easier to 
understand why the backend code does what it does:

// See HdfsScanNode.applyParquetCountStarOptimization()


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 8: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

PS8, Line 65: ProcessSplit() will issue the files' scan ranges
:   // and those ranges will need scanner threads, so no files are 
marked completed yet.
hmm, is that stale now? i guess technically not since this now happens in 
GetNextInternal() which is called by ProcessSplit()?


http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS8, Line 133: ProcessSplit
what's the deal with making this non-pure? oh, I guess (most) scanners now 
share the same implementation? but then why keep it virtual? I guess parquet is 
still different? but does parquet actually have to be different? it seems like 
it could conform to the same pattern with a bit more refactoring.

Anyway, I guess it's okay as-is for now if it's not straightforward to make 
parquet follow the same pattern.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] add EPOCH to list of units supported.

2017-06-29 Thread Greg Rahn (Code Review)
Greg Rahn has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7342

Change subject: [DOCS] add EPOCH to list of units supported.
..

[DOCS] add EPOCH to list of units supported.

Per fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java#L1593

Change-Id: I1e62f295efaa9084a6da682fd04bfb067ec5e4e8
---
M docs/topics/impala_datetime_functions.xml
1 file changed, 4 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/7342/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7342
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1e62f295efaa9084a6da682fd04bfb067ec5e4e8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Greg Rahn 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 886: prefix + 
"8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_",
> You'd have to replace the uses of the agg slot with the zeroifnull() expres
Taras, I think the rewrite solution becomes more viable if we follow the 
approach where the AggInfo is passed directly into the HdfsScanNode. The scan 
can create an smap with two entries:

count(*) -> sum(num_rows_slot)
slotref -> zeroifnull(slotref)

where the slotref of the second entry is the agg slot from the first-level 
aggregation corresponding to count(*).

Once we return fro init() from the scan node, we apply the agg optimization 
smap to all the AggInfos (local and merge).


http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 109:  *
> This comment doesn't explain the overall optimization. What I'm looking for
Taras, I think Dan is looking for a comment along the lines of what we have in 
applyParquetCountStartOptimization(). Maybe we can add a condensed version of 
that somewhere.

Dan, where do you expect this comment? Here? in 
SingleNodePlanner.createSelectPlan()? Somewhere else?


PS5, Line 140:  This
 :   // scan does additional analysis in init() to determine 
whether it is correct to apply
 :   // the optimization.
> Okay. If it doesn't work out, I just think the comments needs to be clarifi
I think this approach will work out. We can use Analyzer.tableRefMap_ to 
determine how many table refs are in a query block.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/813/

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 2: Code-Review+2

Rebase. Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/base-sequence-scanner.h
File be/src/exec/base-sequence-scanner.h:

PS7, Line 40:  
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

PS7, Line 134: final_batch
> this looked leaked... the function comment claims it's added to the queue.
Thanks for pointing out these issues. Did some cleanup around the Close() 
functions here as discussed.


http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 183:   // Make sure that we still own the RowBatch if 
BlockingPutWithTimeout() timed out.
> okay, read through the previous comments.
Rvalue references are tricky like that. Agree the interface is questionable due 
to subtlety.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Alex Behm (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6527

to look at the new patch set (#8).

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..

IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 708 insertions(+), 823 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: remove old aggs and joins
..


Patch Set 4:

(1 comment)

Looks good when the time is right. 

Maybe print something to the warning log if either flag is set to true?

http://gerrit.cloudera.org:8080/#/c/7102/4/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

Line 158: *status = SendBuildInputToSink(state, build_sink);
one line?


-- 
To view, visit http://gerrit.cloudera.org:8080/7102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5560: always store CHAR(N) inline in tuple
..


Patch Set 2:

(2 comments)

nice

http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS2, Line 72: 128 bytes
update


http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS2, Line 571: StringValue sv
what's the point of this StringValue now? This seems confusing and round about. 
(It's also weird that dst type is StringValue*, but that's a pre-existing 
weirdness). Seems more straightforward to just do:

char* dst_char = reinterpret_cast(dst);
and use dst_char wherever sv.ptr is used.


-- 
To view, visit http://gerrit.cloudera.org:8080/7303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 1: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/812/

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4687: Get Impala working against HBase 2.0
..


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7277/3/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

PS3, Line 516:   // Newer versions of HBase use a heartbeat, so they do not 
timeout.
 :   // Rather than throwing the ScannerTimeoutException, HBase 
will reset
 :   // the scanner and retry. For HBase 2.0, the 
ScannerTimeoutException is
 :   // removed. In these cases, HandleResultScannerTimeout 
will return the exception
 :   // error message in the status.
this comment seems gratuitous to me (already explained in better locations -- 
here, the caller doesn't really care about this detail and whether timeout can 
be true or not for the given implementation). i.e. you've already hidden the 
difference inside HandleResultScannerTimeout(), which has common semantics for 
1.0 and 2.0.


http://gerrit.cloudera.org:8080/#/c/7277/3/be/src/exec/hbase-table-scanner.h
File be/src/exec/hbase-table-scanner.h:

PS3, Line 59: supports HBase < 0.95.2
I guess the implication is that 0.95.2 uses the same API as 1.0?


-- 
To view, visit http://gerrit.cloudera.org:8080/7277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 183:   // Make sure that we still own the RowBatch if 
BlockingPutWithTimeout() timed out.
> how does that work? the move() on line 180 doesn't move in the case of time
okay, read through the previous comments.

I think we should reconsider the rvalue-ref interface to the queue, but not in 
this patch. this is awfully subtle.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 11: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 122:   }
> Wait() deals with the latest status of WaitInternal, but as far as I could 
I was thinking UpdateQueryStatus() returns the query_status_ but that's not the 
case. Okay, let's leave it alone for now. We really need to clean up the CRS / 
coordinator and related class splits as they currently don't make much sense, 
but that's obviously out of scope for this change.


http://gerrit.cloudera.org:8080/#/c/7155/11/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 255:   !request_state->query_status().ok());
any reason not to make this stronger and have the implication go in both 
directions? 

DCHECK_EQ(query_state() == EXCEPTION, !query_status().ok())

isn't that what we want to guarantee (and what we do guarantee now)?


Line 293: !request_state->query_status().ok());
same


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/812/

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..

IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired during or after the first call to Open()
  and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open().
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tabl

[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 9:

Cleaned up the patch (naming etc) after discussion with Dan and Alex

-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..

IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired in Open() and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open()
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/f

[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 6: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/810/

-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Bump Kudu version to bbed78c
..


Bump Kudu version to bbed78c

Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Reviewed-on: http://gerrit.cloudera.org:8080/7334
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 886: prefix + 
"8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_",
> I gave this a try, but ran into a problem. I tried substituting a count(*) 
You'd have to replace the uses of the agg slot with the zeroifnull() 
expression. If that's too complicated, then I'm fine with leaving it. I still 
think, conceptually, it's cleaner to be able to do rewrites using existing 
expressions rather than build new ones, but if it doesn't work out given our 
current optimizer, then so be it.


http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 109:  *
> This is the only one (I think it's pretty high level).
This comment doesn't explain the overall optimization. What I'm looking for is 
something that explains the optimization: what the rewrite is, and what the 
"special" scan is.


PS5, Line 140:  This
 :   // scan does additional analysis in init() to determine 
whether it is correct to apply
 :   // the optimization.
> An alternative approach that Alex and I are thinking about is passing in th
Okay. If it doesn't work out, I just think the comments needs to be clarified. 
Future readers won't be looking at this code review to understand the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS5, Line 140:  This
 :   // scan does additional analysis in init() to determine 
whether it is correct to apply
 :   // the optimization.
> The optimization is applied if 2 things are true: the query block meets cer
An alternative approach that Alex and I are thinking about is passing in the 
agg info to the scan node so that all the checks would be done in one place.

One potential problem with this is that we need to access the SelectStmt too, 
in order to check that selectStmt.getTableRefs().size() == 1

I'm trying to figure out a way around this without having to pass the entire 
select statement to the scan node.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-5036: Parquet count star optimization
..

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 881 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 886: prefix + 
"8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_",
> rather than defining a new builtin, why don't we wrap the sum() with a coal
I gave this a try, but ran into a problem. I tried substituting a count(*) with 
a zeroifnull(sum(parquet_num_rows)). The aggregation node can only evaluate agg 
functions, and zeroifnull is not.

A potential way out is to modify the merge function so that the input to it is 
wrapped in a zeroifnull(), but I think that's complicated. We would have make 
modifications in 2 places if we go that route: replace count() with sum() and 
wrap the input to the merge function. Simply adding a builtin seems cleaner to 
me.


http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS5, Line 105:  The caller passes in a flag to the constructor that indicates 
if the count(*)
 :  * optimization can be applied to the query block of this scan. 
This scan node decides
 :  * whether to apply the optimization or not
> This doesn't make a lot of sense to me. On one hand, it sayst he caller dec
Reworded slightly. (The main idea is that they are both involved in the 
decision).


Line 109:  *
> do we actually explain in some comment how this optimization works at a hig
This is the only one (I think it's pretty high level).


PS5, Line 140:  This
 :   // scan does additional analysis in init() to determine 
whether it is correct to apply
 :   // the optimization.
> this seems to contradict the comment above that says the caller passes a fl
The optimization is applied if 2 things are true: the query block meets certain 
requirements and the scan node meets certain requirements. This flag indicates 
if the query block meets the requirements.


http://gerrit.cloudera.org:8080/#/c/6812/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test:

Line 86: # Verify that 0 is returned when we are selecting from an empty table 
and the optimization
> I think it's sufficient to say:
Done


http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

Line 280: vector.get_value('exec_option')['batch_size'] = 1
> The core tests we run and the dimensions we use seem somewhat broken/scary 
Created IMPALA-5603.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6812

to look at the new patch set (#6).

Change subject: IMPALA-5036: Parquet count star optimization
..

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 881 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-29 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6812

to look at the new patch set (#6).

Change subject: IMPALA-5036: Parquet count star optimization
..

IMPALA-5036: Parquet count star optimization

Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count() is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.

Testing:
- Added functional and planner tests

Change-Id: I536b85c014821296aed68a0c68faadae96005e62
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test
R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
M tests/query_test/test_aggregation.py
M tests/query_test/test_parquet_stats.py
26 files changed, 881 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 122:   }
> It'd be clearer if we just made CRS::Wait() return the status (it can set t
Wait() deals with the latest status of WaitInternal, but as far as I could tell 
that may not be the same as request_state->query_status(). If WaitInternal 
returns a non-OK status but query_status_ is already non-ok, the latter is not 
overwritten. Taking the lock again here seems easier to reason about to me.

Additionally, making Wait() return a status breaks the symmetry with 
WaitAsync().

I don't feel strongly about it though, let me know if you'd like me to change 
it.


Line 128:   TUniqueIdToQueryHandle(request_state->query_id(), &query_handle);
> I think Wait() takes care of this already, no?
Yes, removed.


Line 252:   // guaranteed to see the error query_status.
> I think this is subtle enough to warrant a comment like:
Done


Line 288:   {
> I think we should comment this too.
Done


Line 290: // guaranteed to see the error query_status.
> I think we should have the query_state == EXCEPTION dcheck here too.
Done


http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1011:   ArchiveQuery(*request_state);
> why do we need to take the map lock here?
It may be a left over from a previous try to add more locks until the error 
goes away. I removed it and will exercise the tests in a loop again.


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#11).

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..

IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

There was a race between ClientRequestState::UpdateQueryStatus() and the
beeswax get_state()/get_log() RPCs leading to the rare situation that a
query would abort with an error, but the error message would be empty.

The fix is to take the ClientRequestState lock in the beeswax RPCs
before obtaining the status.

To test this I ran test_corrupt_files in a loop for a day. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/impala-beeswax-server.cc
1 file changed, 27 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/11
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#10).

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..

IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

There was a race between ClientRequestState::UpdateQueryStatus() and the
beeswax get_state()/get_log() RPCs leading to the rare situation that a
query would abort with an error, but the error message would be empty.

The fix is to take the ClientRequestState lock in the beeswax RPCs
before obtaining the status.

To test this I ran test_corrupt_files in a loop for a day. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
2 files changed, 28 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-29 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 253:   return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, 
rhsCard);
> 1. Changed the code to only pass those FK/PK conjuncts because that makes t
My concern was that if we have compelling evidence that a FK/PK relationship is 
not present, we shouldn't pass those eqJoin conjuncts to the FK/PK cardinality 
estimation - e.g.,

SELECT orders.customer_id, shipments.ship_id from orders, shipments, WHERE 
orders.order_date = shipments.ship_date

where the highly selective non-PK ship_date should not reduce the cardinality 
of the join - ideally, we multiply the cardinality by (|shipments| / 
NDV(shipments.ship_date))


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-29 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#7).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

TODO (Request comments on this additional change):
Additionally made some changes to fix a bug, where impala would crash
if num_disks are set more than the number of logical disks on system and
num_threads_per_disk is not set. This bug also lets the user create more
disk queues than the num of logical disks which would eventually never
be used. As a part of this effort, existing test cases in
disk-io-mgr-test were identified that would fail after this bug fix.
This issue will be tracked in IMPALA-5574, but until then,
DiskInfo::is_rotational() had to be changed to make these tests run as
before. Moreover, after this fix, if num_disks is set to more than
the number of logical disks then it will default to max available disks
and log a warning.

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/disk-info.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
8 files changed, 108 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-29 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7232/6/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

PS6, Line 1083: &client_buffer[0]
no idea where this came from, probably when i checked out this file from master 
but strangely even master doesn't have this.
reverting back in next patch ASAP


PS6, Line 1121: &client_buffer[0]
same as above


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7282/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

PS2, Line 443: self._num_queries_timedout.value - 
self._num_queries_cancelled.value
just curious, what's the relationship between "timed out" and "cancelled" 
queries?  I always thought timed out means hung and cancelled were the 
deliberately cancelled queries (to test cancellation), but apparently, and so 
don't see why the bookkeeping is related, but I must be missing something?  
(But I also see this is consistent with line 713 and so does seem "correct").


-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


IMPALA-5281: stress test: introduce stricter pass guidelines

1. Report incorrect results count in the console log table. Previously,
the stress test knew about incorrect results but only reported them to
the console log inline. In was on the onus of a caller to find this. Now
we have a summed count.

2. Fail the process if there are errors, incorrect results, or timeouts.
Previously, the stress test just counted these, but would not fail its
process. This leads to a much stricter pass criteria for the stress
test. This will allow CI to fail and alert a maintainer that something
went wrong.

Testing:

I modified the result hashes for queries in a local runtime_info.json
and observed the reporting of incorrect results, incremented incorrect
results counts, and ultimately process failure.

Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Reviewed-on: http://gerrit.cloudera.org:8080/7282
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins
---
M tests/stress/concurrent_select.py
1 file changed, 24 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-29 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#6).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

TODO (Request comments on this additional change):
Additionally made some changes to fix a bug, where impala would crash
if num_disks are set more than the number of logical disks on system and
num_threads_per_disk is not set. This bug also lets the user create more
disk queues than the num of logical disks which would eventually never
be used. As a part of this effort, existing test cases in
disk-io-mgr-test were identified that would fail after this bug fix.
This issue will be tracked in IMPALA-5574, but until then,
DiskInfo::is_rotational() had to be changed to make these tests run as
before. Moreover, after this fix, if num_disks is set to more than
the number of logical disks then it will default to max available disks
and log a warning.

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/disk-info.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
8 files changed, 110 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-29 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7232/5/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

PS5, Line 262: CheckAndGetFlagInOrder
> how about calling this
Done


PS5, Line 376: is_rotational
> as we discussed, let's have this return false if i is outside the bounds of
Made changes as suggested and reverted corresponding changes in the backend 
tests as well


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-06-29 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6023/5/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 452: // We can store the results in a decimal8Value. But this 
change hard codes to
> It would be good to summarize the calculations and comment on the chosen ty
Done. 
Do you think it would useful to move the comments at relevant places inside the 
functions instead of adding it before the definition?


-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-06-29 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#6).

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..

IMPALA-4848: Add WIDTH_BUCKET() function

Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
---
M be/src/exprs/expr-test.cc
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/Expr.java
5 files changed, 169 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


IMPALA-5483: Automatically disable codegen for small queries

This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.

This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.

Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).

I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s.  To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.

Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)

Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.

Added an e2e test to make sure that the optimisation is applied in
the backend.

Added planner tests for various cases where codegen should and shouldn't
be disabled.

Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.

Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Reviewed-on: http://gerrit.cloudera.org:8080/7153
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
A tests/query_test/test_codegen.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
21 files changed, 461 insertions(+), 45 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-29 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor;
Is this used?


PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner;
replace


http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS2, Line 79: -
nit: remove


PS2, Line 80: ordered based on the tuple ids
I am not sure I get what that ordering is. Do you mean grouped?


PS2, Line 202: based on the single most
 :* selective join predicate. We do not attempt to estimate the 
joint selectivity of
 :* multiple join predicates to avoid underestimation.
Much better. Thanks


PS2, Line 280: fkPkCandidate.get(0)
Can you plz explain this? I could be wrong about this but it seems that the 
decision of whether this is a pk/fk depends on which equi-join slots we process 
first. Maybe I am missing something. In either case, plz add a comment.


PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0);
Are we certain this is a valid precondition check? Why isn't a 0 rhsCard 
possible? Unless this is guaranteed to be the case...


PS2, Line 337: we adjustments
nit: grammar


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7223/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

Line 69:   public ResourceProfile max(ResourceProfile other) {
I also changed these to be non-static to reduce the verbosity of some of the 
code.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/811/

-- 
To view, visit http://gerrit.cloudera.org:8080/7334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java:

Line 82:   private void recomputeResourceProfiles() {
> This seems weird to me and I believe we can potentially get rid of this if 
Done. Reworked this so that the traversal logic is all in computeResourceReqs()


Line 196: 
This didn't connect up the fragments correctly.


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PlanVisitor.java
File fe/src/main/java/org/apache/impala/planner/PlanVisitor.java:

Line 6: public interface PlanVisitor {
> We already have a Visitor and a corresponding accept() function. Can we con
Removed this in favour of doing an explicit traversal in computeResourceReq()


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 352:   private static class ResourceAggregator implements PlanVisitor {
> Do we really need this separate visitor that only sums over things we have 
Was able to just do an explicit traversal in computeResourceReq().


Line 392: planRoots.get(0).walkPlan(aggregator);
> The new approach makes sense to me - I like it.
PlanTreeResources vs. ResourceProfile:
* Updated the ResourceProfile comment
* Renamed PlanTreeResources to PhaseResourceProfiles, since it's really just a 
wrapper around ResourceProfiles for two different phases of execution.

Visitor and recursion:
* I was able to rework this so that the whole traversal is done iteratively in 
this function using collect()/getNodesPostOrder(). There's no need for a 
visitor then.

Entry point: this is now not called by finalize(). Renamed finalize() to 
finalizeExchanges() since that is more precisely what it does.


http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

Line 23:  * The resources that will be consumed by a set of plan nodes.
> Does this really represent the resources of a single PlanNode instance or P
Updated the comment. This abstraction is agnostic about what collection of 
things it is a profile for - it really is just a container for a set of peak 
values.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..

IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired in Open() and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open()
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
40 files changed, 2,819 insertions(+), 410 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/810/

-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-29 Thread Bikramjeet Vig (Code Review)
Hello Impala Public Jenkins, Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7203

to look at the new patch set (#6).

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..

IMPALA-3504: UDF for current timestamp in UTC

This change adds a UDF "utc_timestamp" which returns the current
date and time in UTC. Example query:

select utc_timestamp();

+---+
| utc_timestamp()   |
+---+
| 2017-06-15 17:36:39.290773000 |
+---+

Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M common/function-registry/impala_functions.py
M common/thrift/ImpalaInternalService.thrift
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
10 files changed, 73 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7203/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

2017-06-29 Thread John Sherman (Code Review)
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

Line 210
The prior comment mentioned potential thread safety issues, but I didn't see 
any thread safety issues in the current usage. If someone has experience in 
this area, it's probably worth it to make sure I didn't miss something.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

2017-06-29 Thread John Sherman (Code Review)
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
..


Patch Set 3:

Ran the latest patch through be/fe/e2e testing.
be/fe ran clean.
end to end mostly ran clean. I had one failure I couldn't explain (could be 
related to me using Java 8/Ubuntu 16 or the limited amount of memory in my 
development environment):
custom_cluster/test_hdfs_fd_caching.py::TestHdfsFdCaching::test_caching_enabled
custom_cluster/test_hdfs_fd_caching.py:125: in test_caching_enabled
self.run_fd_caching_test(vector, caching_expected, cache_capacity)
custom_cluster/test_hdfs_fd_caching.py:85: in run_fd_caching_test
assert num_handles_after == (num_handles_start + 1)
E   assert 0 == (0 + 1)

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

2017-06-29 Thread John Sherman (Code Review)
John Sherman has uploaded a new patch set (#3).

Change subject: IMPALA-5394: Handle blocked HS2 connections
..

IMPALA-5394: Handle blocked HS2 connections

- TThreadPoolServer calls getTransport() on a client from the Server
  thread (the thread that does the accepts).
  - TSaslServerTransport->getTransport() calls TSaslTransport->open()
  - TSaslServerTransport->open() tries to negotiate SASL which calls
read/write
- If read/write blocks, the TThreadPoolServer cannot accept
  connections
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 30
  milliseconds (5 minutes)
- Changed the Thrift server type for hs2 connections from ThreadPool
  to Threaded to take advantage of the AcceptQueueServer
  implementation.
- Increased the AcceptQueueServer CONNECTION_SETUP_POOL_SIZE from
  1 to 2, so new connections can continue to be accepted if for
  some reason a connection needs to use the timeout period.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
4 files changed, 49 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7334

Change subject: Bump Kudu version to bbed78c
..

Bump Kudu version to bbed78c

Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/7334/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 122:   }
It'd be clearer if we just made CRS::Wait() return the status (it can set that 
status after all) rather than re-taking the lock.


Line 128:   
request_state->UpdateNonErrorQueryState(beeswax::QueryState::FINISHED);
I think Wait() takes care of this already, no?


Line 252:   lock_guard l(*request_state->lock());
I think this is subtle enough to warrant a comment like:
// Take the lock to ensure that if the client sees a query_state == EXCEPTOIN, 
it is guaranteed to see the error query_status.


Line 288: lock_guard l(*request_state->lock());
I think we should comment this too.


Line 290: if (!request_state->query_status().ok()) {
I think we should have the query_state == EXCEPTION dcheck here too.


http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1011:   lock_guard l(client_request_state_map_lock_);
why do we need to take the map lock here?


-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/809/

-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..


Patch Set 9:

Working on this had been blocked by IMPALA-5567 which has now been fixed. I had 
to rebase the change to make it build. I also added one more lock to 
ImpalaServer::executeAndWait() since the test would still sporadically fail. 
Now I'm back to running the test in a loop, trying to see if it still repros 
with PS9.

-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#9).

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..

IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

There was a race between ClientRequestState::UpdateQueryStatus() and the
beeswax get_state()/get_log() RPCs leading to the rare situation that a
query would abort with an error, but the error message would be empty.

The fix is to take the ClientRequestState lock in the beeswax RPCs
before obtaining the status.

To test this I ran test_corrupt_files in a loop for a day. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
2 files changed, 22 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10:

Makes sense.

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10:

The "disabled due to optimization hints" message that you added a while back 
will show up in the profile too.

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


Re: [Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Alexander Behm
The explain plan will include "Codegen disabled by planner"

On Thu, Jun 29, 2017 at 10:42 AM, Michael Ho (Code Review) <
ger...@cloudera.org> wrote:

> Michael Ho has posted comments on this change.
>
> Change subject: IMPALA-5483: Automatically disable codegen for small
> queries
> ..
>
>
> Patch Set 10:
>
> Sorry, haven't caught up with the review yet. Is there going to be a
> message in the profile indicating the reason for codegen to be disabled ?
> This seems rather important for supportability.
>
> --
> To view, visit http://gerrit.cloudera.org:8080/7153
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
> Gerrit-PatchSet: 10
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Tim Armstrong 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: Juan Yu 
> Gerrit-Reviewer: Michael Ho
> Gerrit-Reviewer: Michael Ho 
> Gerrit-Reviewer: Mostafa Mokhtar 
> Gerrit-Reviewer: Tim Armstrong 
> Gerrit-HasComments: No
>


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10:

Sorry, haven't caught up with the review yet. Is there going to be a message in 
the profile indicating the reason for codegen to be disabled ? This seems 
rather important for supportability.

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

2017-06-29 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#7).

Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and 
beeswax RPCs
..

IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs

There was a race between ClientRequestState::UpdateQueryStatus() and the
beeswax get_state()/get_log() RPCs leading to the rare situation that a
query would abort with an error, but the error message would be empty.

The fix is to take the ClientRequestState lock in the beeswax RPCs
before obtaining the status.

To test this I ran test_corrupt_files in a loop for a day. Without this
fix, it would usually fail within a few hours.

I changed the test to allow running it in parallel like so:

@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):

Then I ran it in a loop like so:

i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \
tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \
--exploration_strategy=exhaustive -n8; done

Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
2 files changed, 24 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines

2017-06-29 Thread Matthew Mulder (Code Review)
Matthew Mulder has posted comments on this change.

Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines
..


Patch Set 1: Code-Review+1

I reviewed the code and performed a test run, but I'm relying on Michael's 
testing of changing the result hashes to produce incorrect results.

-- 
To view, visit http://gerrit.cloudera.org:8080/7282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/808/

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 10: Code-Review+2

Update TestStatsExtrapolation

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-29 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7153

to look at the new patch set (#10).

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..

IMPALA-5483: Automatically disable codegen for small queries

This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.

This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.

Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).

I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s.  To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.

Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)

Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.

Added an e2e test to make sure that the optimisation is applied in
the backend.

Added planner tests for various cases where codegen should and shouldn't
be disabled.

Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.

Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
A tests/query_test/test_codegen.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
21 files changed, 461 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[native-toolchain-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1:

http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/416/

-- 
To view, visit http://gerrit.cloudera.org:8080/7314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged.

Change subject: Bump Kudu version to bbed78c
..


Bump Kudu version to bbed78c

Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Thomas Tauber-Marshall: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[native-toolchain-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection

2017-06-29 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5588: Reduce the frequency of fault injection
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[native-toolchain-CR] Bump Kudu version to bbed78c

2017-06-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to bbed78c
..


Patch Set 1:

Is this ready to go in? Impala side commit?

-- 
To view, visit http://gerrit.cloudera.org:8080/7314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..


IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

This change introduces a new rule to merge disjunct equality
predicates into an IN predicate. As with every rule being applied
bottom up, the rule merges the leaf OR predicates into an in predicate
and subsequently merges the OR predicate to the existing IN predicate
It will also merge two compatible IN predicates into a single IN
predicate.

Patch also addresses review comments to
normalize the binary predicates and testcases for the same.
binary predicates of the form constant  non constant are normalized
to non constant  constant

Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Reviewed-on: http://gerrit.cloudera.org:8080/7110
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 284 insertions(+), 49 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sandeep akinapelli 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: sandeep akinapelli 


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sandeep akinapelli 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: sandeep akinapelli 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-06-29 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-5582: Store sentry privileges in lower case
..


Patch Set 2:

Tested this manually. Will try to add some test cases.

-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-06-29 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-5582: Store sentry privileges in lower case
..

IMPALA-5582: Store sentry privileges in lower case

Privileges granted to a db whose name contains upper case
characters can disappear after a few seconds. A privilege
is inserted into the catalogObjectCache using a key that
uses the db name. The key gets converted to lower case
before inserting. The sentryProxy thread on the other hand
returns the db name in lower case. When the catalogObjectCache
gets updated and the old catalog object is removed from the
cache it ends up deleting the new object since they both use
the same key. This change stores the privileges in lower case
which does not trigger these chain on events on a sentryProxy
thread update.

Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
---
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 


[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case

2017-06-29 Thread anujphadke (Code Review)
anujphadke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7332

Change subject: IMPALA-5582: Store sentry privileges in lower case
..

IMPALA-5582: Store sentry privileges in lower case

Privileges granted to a db whose name contains upper case characters can 
disappear
after a few seconds. A privilege is inserted into the catalogObjectCache using 
a key
that uses the db name. The key gets converted to lower case before inserting.
The sentryProxy thread on the other hand returns the db name in lower case. 
When the
catalogObjectCache gets updated and the old catalog object is removed from the 
cache
it ends up deleting the new object since they both use the same key. This 
change stores
the privileges in lower case  which does not trigger these chain on events on a
sentryProxy thread update.

Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
---
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-29 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-5547: Rework FK/PK join detection.
..

IMPALA-5547: Rework FK/PK join detection.

Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs

The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.

For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.

Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.

Performance testing:
1. Full TPC-DS run on 10TB:
   - Q10 improved by >100x
   - Q72 improved by >25x
   - Q17,Q26,Q29 improved by 2x
   - Q64 regressed by 10x
   - Total runtime: Improved by 2x
   - Geomean: Minor improvement
   The regression of Q64 is understood and we will try to address it
   in follow-on changes. The previous plan was better by accident and
   not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
   - No perf differences

Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.

Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
---
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
17 files changed, 1,485 insertions(+), 1,351 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 1:

(20 comments)

I still need to add the targeted planner tests and adjust new tests after the 
rebase. In the meantime, you can continue looking at the code changes.

http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) {
 :   output.append("assumed fk/pk");
> I haven't read the entire patch yet, so I am curious what this case represe
In this case we have no equi-join conjuncts that we can reason about, so we are 
optimistic and assume fk/pk with a join selectivity of 1.

We cannot reason about an equi-join conjunct if the underlying table/columns 
have no stats or if the conjunct involves non-trivial expressions (anything 
that is not a SlotRef).


PS1, Line 178: for (int j = 0; j < slots.size(); ++j) {
 :   output.append(slots.get(j).toString());
 :   if (j + 1 != slots.size()) output.append(", ");
 : }
> Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots)
Better. Done.


http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS1, Line 37: import 
org.apache.kudu.client.shaded.com.google.common.collect.Maps;
> ?
Sigh. Fixed.


Line 91:   protected List> fkPkEqJoinConjuncts_;
> Class member is only used in one function in the base class.  Maybe documen
Added comment. Not sure it's worth protecting against accidental modifications 
in inheriting classes. I think it's fair to assume that subclasses should treat 
protected members carefully. Subclasses can already muck with pretty much 
anything here, and adding getters that wrap members with immutables seems 
cumbersome, and generates objects "unnecessarily".


Line 253:   return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, 
rhsCard);
> This computes estimates based on both PK conjuncts and non-PK conjuncts usi
1. Changed the code to only pass those FK/PK conjuncts because that makes the 
behavior and code clearer. I'm not sure I follow your described scenario. If 
the RHS is very selective, then it's correct to apply that to the join 
cardinality - if we believe that FK/PK is indeed the case. Are you concerned 
that our FK/PK assumption could be wrong and we might underestimate the join 
cardinality

2. Good idea. A flattened, ordered list makes things simpler in various places. 
Done.


PS1, Line 271: scanSlotsByTids
> nit: Each pair represents two joined tables, no? So, maybe rename this to '
Done


Line 275: for (List fkPkCandi: 
scanSlotsByTids.values()) {
> fkPkCandi - this could use a better name, it isn't clear to me what this is
I renamed it to fkPkCandidate, but I doubt that's what you had in mind. Can you 
help me come up with a better name? This thing is a list of equi-join conjuncts 
between the same two tables. These conjuncts could represent a single- or 
multi-column FK/PK join condition.


PS1, Line 281: numRows
> rhsTableCardinality?
Used rhsNumRows because we typically use 'numRows' to indicate values that come 
directly from stats, although technically table cardinality is also correct of 
course.


Line 290: 
> nit: extra line
Done


PS1, Line 294: conjuncts
> conjunct slots
Done


PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || 
joinOp_.isOuterJoin());
> This private function is only called in L253? I think you can remove this c
Done


PS1, Line 312: lhsNdv
> Can this ever be 0?
Probably. Better be defensive. Fixed here and elsewhere.


PS1, Line 313: rhsNumRows
> Same here, can this be 0?
Probably. Better be defensive. Fixed here and elsewhere.


PS1, Line 318: result = Math.min(result, joinCard);
> That part I think is very important and I am not sure it is explained anywh
This part applies to both methods (generic and FK/PK). I expanded the comment 
on getJoinCardinality() to explain why we take the min().

We should certainly try (in a subsequent patch) whether estimating the joint 
selectivity of multiple predicates with exponential backoff or similar works 
well.


PS1, Line 329: conjuncts
> conjunct slots
Done


PS1, Line 341: slots.lhs().getStats().getNumDistinctValues();
> nit: these are used in couple of places. Since you already have the EqJoinC
Done


PS1, Line 343: slots.lhs().getParent().getTable().getNumRows();
> same comment as above.
Done


PS1, Line 373: Preconditions
> I don't think these checks are useful. This private constructor is only cal
Done


PS1, Line 382: .
> nit: remove
Done


PS1, Line 399: tupleDesc
> nit: inline
Removed tupleDesc instead, otherwise the if condition below becomes multi-line