[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Hello Dan Hecht, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6527 to look at the new patch set (#9). Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/kudu-scan-node.cc M be/src/exec/scan-node.h M be/src/util/blocking-queue.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 26 files changed, 709 insertions(+), 824 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/9 -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS8, Line 65: ProcessSplit() will issue the files' scan ranges : // and those ranges will need scanner threads, so no files are marked completed yet. > hmm, is that stale now? i guess technically not since this now happens in G Good catch. I think this is misleading. Changed to GetNextInternal() http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS8, Line 133: ProcessSplit > what's the deal with making this non-pure? oh, I guess (most) scanners now Happy to address this, but let's discuss approaches first. Options: * add a new virtual function that is a no-op for all scanners except parquet where we do a runtime filter check * move the runtime filter stats and related functions like HdfsParquetScanner::CheckFiltersEffectiveness() into HdfsScanner and just do the runtime filter check for all scanners even though they are useless for non-parquet * other ideas? -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Dan Hecht has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 109: * > Taras, I think Dan is looking for a comment along the lines of what we have Yup, that's the comment I was looking for. I'm fine with its current location without duplicating it, but maybe cross-reference it from where optimize_parquet_count_star_ is defined in the header, to make it easier to understand why the backend code does what it does: // See HdfsScanNode.applyParquetCountStarOptimization() -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 8: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS8, Line 65: ProcessSplit() will issue the files' scan ranges : // and those ranges will need scanner threads, so no files are marked completed yet. hmm, is that stale now? i guess technically not since this now happens in GetNextInternal() which is called by ProcessSplit()? http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS8, Line 133: ProcessSplit what's the deal with making this non-pure? oh, I guess (most) scanners now share the same implementation? but then why keep it virtual? I guess parquet is still different? but does parquet actually have to be different? it seems like it could conform to the same pattern with a bit more refactoring. Anyway, I guess it's okay as-is for now if it's not straightforward to make parquet follow the same pattern. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] add EPOCH to list of units supported.
Greg Rahn has uploaded a new change for review. http://gerrit.cloudera.org:8080/7342 Change subject: [DOCS] add EPOCH to list of units supported. .. [DOCS] add EPOCH to list of units supported. Per fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java#L1593 Change-Id: I1e62f295efaa9084a6da682fd04bfb067ec5e4e8 --- M docs/topics/impala_datetime_functions.xml 1 file changed, 4 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/7342/1 -- To view, visit http://gerrit.cloudera.org:8080/7342 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1e62f295efaa9084a6da682fd04bfb067ec5e4e8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Greg Rahn
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Alex Behm has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: Line 886: prefix + "8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_", > You'd have to replace the uses of the agg slot with the zeroifnull() expres Taras, I think the rewrite solution becomes more viable if we follow the approach where the AggInfo is passed directly into the HdfsScanNode. The scan can create an smap with two entries: count(*) -> sum(num_rows_slot) slotref -> zeroifnull(slotref) where the slotref of the second entry is the agg slot from the first-level aggregation corresponding to count(*). Once we return fro init() from the scan node, we apply the agg optimization smap to all the AggInfos (local and merge). http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 109: * > This comment doesn't explain the overall optimization. What I'm looking for Taras, I think Dan is looking for a comment along the lines of what we have in applyParquetCountStartOptimization(). Maybe we can add a condensed version of that somewhere. Dan, where do you expect this comment? Here? in SingleNodePlanner.createSelectPlan()? Somewhere else? PS5, Line 140: This : // scan does additional analysis in init() to determine whether it is correct to apply : // the optimization. > Okay. If it doesn't work out, I just think the comments needs to be clarifi I think this approach will work out. We can use Analyzer.tableRefMap_ to determine how many table refs are in a query block. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/813/ -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Michael Ho has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 2: Code-Review+2 Rebase. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/base-sequence-scanner.h File be/src/exec/base-sequence-scanner.h: PS7, Line 40: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: PS7, Line 134: final_batch > this looked leaked... the function comment claims it's added to the queue. Thanks for pointing out these issues. Did some cleanup around the Close() functions here as discussed. http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 183: // Make sure that we still own the RowBatch if BlockingPutWithTimeout() timed out. > okay, read through the previous comments. Rvalue references are tricky like that. Agree the interface is questionable due to subtlety. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6527 to look at the new patch set (#8). Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/kudu-scan-node.cc M be/src/exec/scan-node.h M be/src/util/blocking-queue.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 26 files changed, 708 insertions(+), 823 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/8 -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 1: remove old aggs and joins .. Patch Set 4: (1 comment) Looks good when the time is right. Maybe print something to the warning log if either flag is set to true? http://gerrit.cloudera.org:8080/#/c/7102/4/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: Line 158: *status = SendBuildInputToSink(state, build_sink); one line? -- To view, visit http://gerrit.cloudera.org:8080/7102 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Dan Hecht has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 2: (2 comments) nice http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS2, Line 72: 128 bytes update http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 571: StringValue sv what's the point of this StringValue now? This seems confusing and round about. (It's also weird that dst type is StringValue*, but that's a pre-existing weirdness). Seems more straightforward to just do: char* dst_char = reinterpret_cast(dst); and use dst_char wherever sv.ptr is used. -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 1: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/812/ -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0
Dan Hecht has posted comments on this change. Change subject: IMPALA-4687: Get Impala working against HBase 2.0 .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/7277/3/be/src/exec/hbase-table-scanner.cc File be/src/exec/hbase-table-scanner.cc: PS3, Line 516: // Newer versions of HBase use a heartbeat, so they do not timeout. : // Rather than throwing the ScannerTimeoutException, HBase will reset : // the scanner and retry. For HBase 2.0, the ScannerTimeoutException is : // removed. In these cases, HandleResultScannerTimeout will return the exception : // error message in the status. this comment seems gratuitous to me (already explained in better locations -- here, the caller doesn't really care about this detail and whether timeout can be true or not for the given implementation). i.e. you've already hidden the difference inside HandleResultScannerTimeout(), which has common semantics for 1.0 and 2.0. http://gerrit.cloudera.org:8080/#/c/7277/3/be/src/exec/hbase-table-scanner.h File be/src/exec/hbase-table-scanner.h: PS3, Line 59: supports HBase < 0.95.2 I guess the implication is that 0.95.2 uses the same API as 1.0? -- To view, visit http://gerrit.cloudera.org:8080/7277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 183: // Make sure that we still own the RowBatch if BlockingPutWithTimeout() timed out. > how does that work? the move() on line 180 doesn't move in the case of time okay, read through the previous comments. I think we should reconsider the rvalue-ref interface to the queue, but not in this patch. this is awfully subtle. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Dan Hecht has posted comments on this change. Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. Patch Set 11: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 122: } > Wait() deals with the latest status of WaitInternal, but as far as I could I was thinking UpdateQueryStatus() returns the query_status_ but that's not the case. Okay, let's leave it alone for now. We really need to clean up the CRS / coordinator and related class splits as they currently don't make much sense, but that's obviously out of scope for this change. http://gerrit.cloudera.org:8080/#/c/7155/11/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 255: !request_state->query_status().ok()); any reason not to make this stronger and have the implication go in both directions? DCHECK_EQ(query_state() == EXCEPTION, !query_status().ok()) isn't that what we want to guarantee (and what we do guarantee now)? Line 293: !request_state->query_status().ok()); same -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/812/ -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has uploaded a new patch set (#10). Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired during or after the first call to Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open(). * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/common/TreeNode.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tabl
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. Patch Set 9: Cleaned up the patch (naming etc) after discussion with Dan and Alex -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired in Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open() * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/common/TreeNode.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/f
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3504: UDF for current timestamp in UTC .. Patch Set 6: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/810/ -- To view, visit http://gerrit.cloudera.org:8080/7203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to bbed78c
Impala Public Jenkins has submitted this change and it was merged. Change subject: Bump Kudu version to bbed78c .. Bump Kudu version to bbed78c Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Reviewed-on: http://gerrit.cloudera.org:8080/7334 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] Bump Kudu version to bbed78c
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Dan Hecht has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: Line 886: prefix + "8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_", > I gave this a try, but ran into a problem. I tried substituting a count(*) You'd have to replace the uses of the agg slot with the zeroifnull() expression. If that's too complicated, then I'm fine with leaving it. I still think, conceptually, it's cleaner to be able to do rewrites using existing expressions rather than build new ones, but if it doesn't work out given our current optimizer, then so be it. http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 109: * > This is the only one (I think it's pretty high level). This comment doesn't explain the overall optimization. What I'm looking for is something that explains the optimization: what the rewrite is, and what the "special" scan is. PS5, Line 140: This : // scan does additional analysis in init() to determine whether it is correct to apply : // the optimization. > An alternative approach that Alex and I are thinking about is passing in th Okay. If it doesn't work out, I just think the comments needs to be clarified. Future readers won't be looking at this code review to understand the code. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS5, Line 140: This : // scan does additional analysis in init() to determine whether it is correct to apply : // the optimization. > The optimization is applied if 2 things are true: the query block meets cer An alternative approach that Alex and I are thinking about is passing in the agg info to the scan node so that all the checks would be done in one place. One potential problem with this is that we need to access the SelectStmt too, in order to check that selectStmt.getTableRefs().size() == 1 I'm trying to figure out a way around this without having to pass the entire select statement to the scan node. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-5036: Parquet count star optimization .. IMPALA-5036: Parquet count star optimization Instead of materializing empty rows when computing count star, we use the data stored in the Parquet RowGroup.num_rows field. The Parquet scanner tuple is modified to have one slot into which we will write the num rows statistic. The aggregate function is changed from count to a special sum function that gets initialized to 0. We also add a rewrite rule so that count() is rewritten to count(*) in order to make sure that this optimization is applied in all cases. Testing: - Added functional and planner tests Change-Id: I536b85c014821296aed68a0c68faadae96005e62 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test A testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_aggregation.py M tests/query_test/test_parquet_stats.py 26 files changed, 881 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6 -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: Line 886: prefix + "8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_", > rather than defining a new builtin, why don't we wrap the sum() with a coal I gave this a try, but ran into a problem. I tried substituting a count(*) with a zeroifnull(sum(parquet_num_rows)). The aggregation node can only evaluate agg functions, and zeroifnull is not. A potential way out is to modify the merge function so that the input to it is wrapped in a zeroifnull(), but I think that's complicated. We would have make modifications in 2 places if we go that route: replace count() with sum() and wrap the input to the merge function. Simply adding a builtin seems cleaner to me. http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS5, Line 105: The caller passes in a flag to the constructor that indicates if the count(*) : * optimization can be applied to the query block of this scan. This scan node decides : * whether to apply the optimization or not > This doesn't make a lot of sense to me. On one hand, it sayst he caller dec Reworded slightly. (The main idea is that they are both involved in the decision). Line 109: * > do we actually explain in some comment how this optimization works at a hig This is the only one (I think it's pretty high level). PS5, Line 140: This : // scan does additional analysis in init() to determine whether it is correct to apply : // the optimization. > this seems to contradict the comment above that says the caller passes a fl The optimization is applied if 2 things are true: the query block meets certain requirements and the scan node meets certain requirements. This flag indicates if the query block meets the requirements. http://gerrit.cloudera.org:8080/#/c/6812/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test: Line 86: # Verify that 0 is returned when we are selecting from an empty table and the optimization > I think it's sufficient to say: Done http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 280: vector.get_value('exec_option')['batch_size'] = 1 > The core tests we run and the dimensions we use seem somewhat broken/scary Created IMPALA-5603. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6812 to look at the new patch set (#6). Change subject: IMPALA-5036: Parquet count star optimization .. IMPALA-5036: Parquet count star optimization Instead of materializing empty rows when computing count star, we use the data stored in the Parquet RowGroup.num_rows field. The Parquet scanner tuple is modified to have one slot into which we will write the num rows statistic. The aggregate function is changed from count to a special sum function that gets initialized to 0. We also add a rewrite rule so that count() is rewritten to count(*) in order to make sure that this optimization is applied in all cases. Testing: - Added functional and planner tests Change-Id: I536b85c014821296aed68a0c68faadae96005e62 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test A testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_aggregation.py M tests/query_test/test_parquet_stats.py 26 files changed, 881 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6 -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6812 to look at the new patch set (#6). Change subject: IMPALA-5036: Parquet count star optimization .. IMPALA-5036: Parquet count star optimization Instead of materializing empty rows when computing count star, we use the data stored in the Parquet RowGroup.num_rows field. The Parquet scanner tuple is modified to have one slot into which we will write the num rows statistic. The aggregate function is changed from count to a special sum function that gets initialized to 0. We also add a rewrite rule so that count() is rewritten to count(*) in order to make sure that this optimization is applied in all cases. Testing: - Added functional and planner tests Change-Id: I536b85c014821296aed68a0c68faadae96005e62 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionParams.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test A testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test R testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_aggregation.py M tests/query_test/test_parquet_stats.py 26 files changed, 881 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6812/6 -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has posted comments on this change. Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 122: } > It'd be clearer if we just made CRS::Wait() return the status (it can set t Wait() deals with the latest status of WaitInternal, but as far as I could tell that may not be the same as request_state->query_status(). If WaitInternal returns a non-OK status but query_status_ is already non-ok, the latter is not overwritten. Taking the lock again here seems easier to reason about to me. Additionally, making Wait() return a status breaks the symmetry with WaitAsync(). I don't feel strongly about it though, let me know if you'd like me to change it. Line 128: TUniqueIdToQueryHandle(request_state->query_id(), &query_handle); > I think Wait() takes care of this already, no? Yes, removed. Line 252: // guaranteed to see the error query_status. > I think this is subtle enough to warrant a comment like: Done Line 288: { > I think we should comment this too. Done Line 290: // guaranteed to see the error query_status. > I think we should have the query_state == EXCEPTION dcheck here too. Done http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1011: ArchiveQuery(*request_state); > why do we need to take the map lock here? It may be a left over from a previous try to add more locks until the error goes away. I removed it and will exercise the tests in a loop again. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has uploaded a new patch set (#11). Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs There was a race between ClientRequestState::UpdateQueryStatus() and the beeswax get_state()/get_log() RPCs leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to take the ClientRequestState lock in the beeswax RPCs before obtaining the status. To test this I ran test_corrupt_files in a loop for a day. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \ tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \ --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/impala-beeswax-server.cc 1 file changed, 27 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/11 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has uploaded a new patch set (#10). Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs There was a race between ClientRequestState::UpdateQueryStatus() and the beeswax get_state()/get_log() RPCs leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to take the ClientRequestState lock in the beeswax RPCs before obtaining the status. To test this I ran test_corrupt_files in a loop for a day. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \ tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \ --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc 2 files changed, 28 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/10 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Zach Amsden has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: Line 253: return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, rhsCard); > 1. Changed the code to only pass those FK/PK conjuncts because that makes t My concern was that if we have compelling evidence that a FK/PK relationship is not present, we shouldn't pass those eqJoin conjuncts to the FK/PK cardinality estimation - e.g., SELECT orders.customer_id, shipments.ship_id from orders, shipments, WHERE orders.order_date = shipments.ship_date where the highly selective non-PK ship_date should not reduce the cardinality of the join - ideally, we multiply the cardinality by (|shipments| / NDV(shipments.ship_date)) -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has uploaded a new patch set (#7). Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. IMPALA-5240: Allow config of number of disk I/O threads per disk type Currently Impala defaults to 8 threads per flash disk and 1 thread per rotational disk. This can be overridden with --num_threads_per_disk, but that sets the thread count independent of disk type. This would allow control of the number of disk I/O threads spawned independently for solid state disks using "--num_io_threads_per_solid_state_disk" and for rotational disks using "--num_io_threads_per_rotational_disk" as startup parameters. Testing: Added backend tests that verify if "num_threads_per_disk", "num_io_threads_per_solid_state_disk" and "num_io_threads_per_rotational_disk" are used to spawn threads appropriately TODO (Request comments on this additional change): Additionally made some changes to fix a bug, where impala would crash if num_disks are set more than the number of logical disks on system and num_threads_per_disk is not set. This bug also lets the user create more disk queues than the num of logical disks which would eventually never be used. As a part of this effort, existing test cases in disk-io-mgr-test were identified that would fail after this bug fix. This issue will be tracked in IMPALA-5574, but until then, DiskInfo::is_rotational() had to be changed to make these tests run as before. Moreover, after this fix, if num_disks is set to more than the number of logical disks then it will default to max available disks and log a warning. Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c --- M be/src/runtime/disk-io-mgr-stress.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/disk-info.h M be/src/util/thread.cc M be/src/util/thread.h M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java 8 files changed, 108 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/7 -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/7232/6/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: PS6, Line 1083: &client_buffer[0] no idea where this came from, probably when i checked out this file from master but strangely even master doesn't have this. reverting back in next patch ASAP PS6, Line 1121: &client_buffer[0] same as above -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Dan Hecht has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Dan Hecht has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7282/2/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: PS2, Line 443: self._num_queries_timedout.value - self._num_queries_cancelled.value just curious, what's the relationship between "timed out" and "cancelled" queries? I always thought timed out means hung and cancelled were the deliberately cancelled queries (to test cancellation), but apparently, and so don't see why the bookkeeping is related, but I must be missing something? (But I also see this is consistent with line 713 and so does seem "correct"). -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. IMPALA-5281: stress test: introduce stricter pass guidelines 1. Report incorrect results count in the console log table. Previously, the stress test knew about incorrect results but only reported them to the console log inline. In was on the onus of a caller to find this. Now we have a summed count. 2. Fail the process if there are errors, incorrect results, or timeouts. Previously, the stress test just counted these, but would not fail its process. This leads to a much stricter pass criteria for the stress test. This will allow CI to fail and alert a maintainer that something went wrong. Testing: I modified the result hashes for queries in a local runtime_info.json and observed the reporting of incorrect results, incremented incorrect results counts, and ultimately process failure. Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Reviewed-on: http://gerrit.cloudera.org:8080/7282 Reviewed-by: Michael Brown Tested-by: Impala Public Jenkins --- M tests/stress/concurrent_select.py 1 file changed, 24 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has uploaded a new patch set (#6). Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. IMPALA-5240: Allow config of number of disk I/O threads per disk type Currently Impala defaults to 8 threads per flash disk and 1 thread per rotational disk. This can be overridden with --num_threads_per_disk, but that sets the thread count independent of disk type. This would allow control of the number of disk I/O threads spawned independently for solid state disks using "--num_io_threads_per_solid_state_disk" and for rotational disks using "--num_io_threads_per_rotational_disk" as startup parameters. Testing: Added backend tests that verify if "num_threads_per_disk", "num_io_threads_per_solid_state_disk" and "num_io_threads_per_rotational_disk" are used to spawn threads appropriately TODO (Request comments on this additional change): Additionally made some changes to fix a bug, where impala would crash if num_disks are set more than the number of logical disks on system and num_threads_per_disk is not set. This bug also lets the user create more disk queues than the num of logical disks which would eventually never be used. As a part of this effort, existing test cases in disk-io-mgr-test were identified that would fail after this bug fix. This issue will be tracked in IMPALA-5574, but until then, DiskInfo::is_rotational() had to be changed to make these tests run as before. Moreover, after this fix, if num_disks is set to more than the number of logical disks then it will default to max available disks and log a warning. Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c --- M be/src/runtime/disk-io-mgr-stress.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/disk-info.h M be/src/util/thread.cc M be/src/util/thread.h M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java 8 files changed, 110 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/6 -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7232/5/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS5, Line 262: CheckAndGetFlagInOrder > how about calling this Done PS5, Line 376: is_rotational > as we discussed, let's have this return false if i is outside the bounds of Made changes as suggested and reverted corresponding changes in the backend tests as well -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
anujphadke has posted comments on this change. Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6023/5/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 452: // We can store the results in a decimal8Value. But this change hard codes to > It would be good to summarize the calculations and comment on the chosen ty Done. Do you think it would useful to move the comments at relevant places inside the functions instead of adding it before the definition? -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
anujphadke has uploaded a new patch set (#6). Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. IMPALA-4848: Add WIDTH_BUCKET() function Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/Expr.java 5 files changed, 169 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/6 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. IMPALA-5483: Automatically disable codegen for small queries This is similar to the single-node execution optimisation, but applies to slightly larger queries that should run in a distributed manner but won't benefit from codegen. This adds a new query option disable_codegen_rows_threshold that defaults to 50,000. If fewer than this number of rows are processed by a plan node per impalad, the cost of codegen almost certainly outweighs the benefit. Using rows processed as a threshold is justified by a simple model that assumes the cost of codegen and execution per row for the same operation are proportional. E.g. if x is the complexity of the operation, n is the number of rows processed, C is a constant factor giving the cost of codegen and Ec/Ei are constant factor giving the cost of codegen'd and interpreted execution and d, then the cost of the codegen'd operator is C * x + Ec * x * n and the cost of the interpreted operator is Ei * x * n. Rearranging means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that (at least with the simplified model) it makes sense to choose interpretation or codegen based on a constant threshold. The model also implies that it is somewhat safer to choose codegen because the additional cost of codegen is O(1) but the additional cost of interpretation is O(n). I ran some experiments with TPC-H Q1, varying the input table size, to determine what the cut-over point where codegen was beneficial was. The cutover was around 150k rows per node for both text and parquet. At 50k rows per node disabling codegen was very beneficial - around 0.12s versus 0.24s. To be somewhat conservative I set the default threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the cutover tends to be higher because there are plan nodes that process many fewer than the max rows. Fix a couple of minor issues in the frontend - the numNodes_ calculation could return 0 for Kudu, and the single node optimization didn't handle the case where for a scan node with conjuncts, a limit and missing stats correctly (it considered the estimate still valid.) Testing: Updated e2e tests that set disable_codegen to set disable_codegen_rows_threshold to 0, so that those tests run both with and without codegen still. Added an e2e test to make sure that the optimisation is applied in the backend. Added planner tests for various cases where codegen should and shouldn't be disabled. Perf: Added a targeted perf test for a join+agg over a small input, which benefits from this change. Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Reviewed-on: http://gerrit.cloudera.org:8080/7153 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test M tests/common/impala_test_suite.py M tests/common/test_dimensions.py A tests/query_test/test_codegen.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_udfs.py 21 files changed, 461 insertions(+), 45 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor; Is this used? PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner; replace http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS2, Line 79: - nit: remove PS2, Line 80: ordered based on the tuple ids I am not sure I get what that ordering is. Do you mean grouped? PS2, Line 202: based on the single most :* selective join predicate. We do not attempt to estimate the joint selectivity of :* multiple join predicates to avoid underestimation. Much better. Thanks PS2, Line 280: fkPkCandidate.get(0) Can you plz explain this? I could be wrong about this but it seems that the decision of whether this is a pk/fk depends on which equi-join slots we process first. Maybe I am missing something. In either case, plz add a comment. PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0); Are we certain this is a valid precondition check? Why isn't a 0 rhsCard possible? Unless this is guaranteed to be the case... PS2, Line 337: we adjustments nit: grammar -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7223/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java: Line 69: public ResourceProfile max(ResourceProfile other) { I also changed these to be non-static to reduce the verbosity of some of the code. -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to bbed78c
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/811/ -- To view, visit http://gerrit.cloudera.org:8080/7334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java: Line 82: private void recomputeResourceProfiles() { > This seems weird to me and I believe we can potentially get rid of this if Done. Reworked this so that the traversal logic is all in computeResourceReqs() Line 196: This didn't connect up the fragments correctly. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/PlanVisitor.java File fe/src/main/java/org/apache/impala/planner/PlanVisitor.java: Line 6: public interface PlanVisitor { > We already have a Visitor and a corresponding accept() function. Can we con Removed this in favour of doing an explicit traversal in computeResourceReq() http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 352: private static class ResourceAggregator implements PlanVisitor { > Do we really need this separate visitor that only sums over things we have Was able to just do an explicit traversal in computeResourceReq(). Line 392: planRoots.get(0).walkPlan(aggregator); > The new approach makes sense to me - I like it. PlanTreeResources vs. ResourceProfile: * Updated the ResourceProfile comment * Renamed PlanTreeResources to PhaseResourceProfiles, since it's really just a wrapper around ResourceProfiles for two different phases of execution. Visitor and recursion: * I was able to rework this so that the whole traversal is done iteratively in this function using collect()/getNodesPostOrder(). There's no need for a visitor then. Entry point: this is now not called by finalize(). Renamed finalize() to finalizeExchanges() since that is more precisely what it does. http://gerrit.cloudera.org:8080/#/c/7223/7/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java: Line 23: * The resources that will be consumed by a set of plan nodes. > Does this really represent the resources of a single PlanNode instance or P Updated the comment. This abstraction is agnostic about what collection of things it is a profile for - it really is just a container for a set of peak values. -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has uploaded a new patch set (#8). Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired in Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open() * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/common/TreeNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 40 files changed, 2,819 insertions(+), 410 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/8 -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3504: UDF for current timestamp in UTC .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/810/ -- To view, visit http://gerrit.cloudera.org:8080/7203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3504: UDF for current timestamp in UTC .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Hello Impala Public Jenkins, Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7203 to look at the new patch set (#6). Change subject: IMPALA-3504: UDF for current timestamp in UTC .. IMPALA-3504: UDF for current timestamp in UTC This change adds a UDF "utc_timestamp" which returns the current date and time in UTC. Example query: select utc_timestamp(); +---+ | utc_timestamp() | +---+ | 2017-06-15 17:36:39.290773000 | +---+ Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-value.h M be/src/service/impala-server.cc M common/function-registry/impala_functions.py M common/thrift/ImpalaInternalService.thrift M fe/src/test/java/org/apache/impala/testutil/TestUtils.java 10 files changed, 73 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7203/6 -- To view, visit http://gerrit.cloudera.org:8080/7203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] Bump Kudu version to bbed78c
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections
John Sherman has posted comments on this change. Change subject: IMPALA-5394: Handle blocked HS2 connections .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: Line 210 The prior comment mentioned potential thread safety issues, but I didn't see any thread safety issues in the current usage. If someone has experience in this area, it's probably worth it to make sure I didn't miss something. -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections
John Sherman has posted comments on this change. Change subject: IMPALA-5394: Handle blocked HS2 connections .. Patch Set 3: Ran the latest patch through be/fe/e2e testing. be/fe ran clean. end to end mostly ran clean. I had one failure I couldn't explain (could be related to me using Java 8/Ubuntu 16 or the limited amount of memory in my development environment): custom_cluster/test_hdfs_fd_caching.py::TestHdfsFdCaching::test_caching_enabled custom_cluster/test_hdfs_fd_caching.py:125: in test_caching_enabled self.run_fd_caching_test(vector, caching_expected, cache_capacity) custom_cluster/test_hdfs_fd_caching.py:85: in run_fd_caching_test assert num_handles_after == (num_handles_start + 1) E assert 0 == (0 + 1) -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections
John Sherman has uploaded a new patch set (#3). Change subject: IMPALA-5394: Handle blocked HS2 connections .. IMPALA-5394: Handle blocked HS2 connections - TThreadPoolServer calls getTransport() on a client from the Server thread (the thread that does the accepts). - TSaslServerTransport->getTransport() calls TSaslTransport->open() - TSaslServerTransport->open() tries to negotiate SASL which calls read/write - If read/write blocks, the TThreadPoolServer cannot accept connections - Set the underlying TSocket's recvTimeout and sendTimeout before the TSaslServerTransport->open() and reset them to 0 after open() completes. - Added sasl_connect_tcp_timeout_ms flag that defaults to 30 milliseconds (5 minutes) - Changed the Thrift server type for hs2 connections from ThreadPool to Threaded to take advantage of the AcceptQueueServer implementation. - Increased the AcceptQueueServer CONNECTION_SETUP_POOL_SIZE from 1 to 2, so new connections can continue to be accepted if for some reason a connection needs to use the timeout period. Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/service/impala-server.cc M be/src/transport/TSaslServerTransport.cpp M common/thrift/metrics.json 4 files changed, 49 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/3 -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] Bump Kudu version to bbed78c
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/7334 Change subject: Bump Kudu version to bbed78c .. Bump Kudu version to bbed78c Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/7334/1 -- To view, visit http://gerrit.cloudera.org:8080/7334 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I595a291885756b3e9138a67e747389cb7fdf7133 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Dan Hecht has posted comments on this change. Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. Patch Set 9: (6 comments) http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 122: } It'd be clearer if we just made CRS::Wait() return the status (it can set that status after all) rather than re-taking the lock. Line 128: request_state->UpdateNonErrorQueryState(beeswax::QueryState::FINISHED); I think Wait() takes care of this already, no? Line 252: lock_guard l(*request_state->lock()); I think this is subtle enough to warrant a comment like: // Take the lock to ensure that if the client sees a query_state == EXCEPTOIN, it is guaranteed to see the error query_status. Line 288: lock_guard l(*request_state->lock()); I think we should comment this too. Line 290: if (!request_state->query_status().ok()) { I think we should have the query_state == EXCEPTION dcheck here too. http://gerrit.cloudera.org:8080/#/c/7155/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1011: lock_guard l(client_request_state_map_lock_); why do we need to take the map lock here? -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/809/ -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Michael Brown has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
David Knupp has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Matthew Mulder Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has posted comments on this change. Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. Patch Set 9: Working on this had been blocked by IMPALA-5567 which has now been fixed. I had to rebase the change to make it build. I also added one more lock to ImpalaServer::executeAndWait() since the test would still sporadically fail. Now I'm back to running the test in a loop, trying to see if it still repros with PS9. -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has uploaded a new patch set (#9). Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs There was a race between ClientRequestState::UpdateQueryStatus() and the beeswax get_state()/get_log() RPCs leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to take the ClientRequestState lock in the beeswax RPCs before obtaining the status. To test this I ran test_corrupt_files in a loop for a day. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \ tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \ --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc 2 files changed, 22 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/9 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Michael Ho has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: Makes sense. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: The "disabled due to optimization hints" message that you added a while back will show up in the profile too. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
Re: [Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
The explain plan will include "Codegen disabled by planner" On Thu, Jun 29, 2017 at 10:42 AM, Michael Ho (Code Review) < ger...@cloudera.org> wrote: > Michael Ho has posted comments on this change. > > Change subject: IMPALA-5483: Automatically disable codegen for small > queries > .. > > > Patch Set 10: > > Sorry, haven't caught up with the review yet. Is there going to be a > message in the profile indicating the reason for codegen to be disabled ? > This seems rather important for supportability. > > -- > To view, visit http://gerrit.cloudera.org:8080/7153 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e > Gerrit-PatchSet: 10 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Tim Armstrong > Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Impala Public Jenkins > Gerrit-Reviewer: Juan Yu > Gerrit-Reviewer: Michael Ho > Gerrit-Reviewer: Michael Ho > Gerrit-Reviewer: Mostafa Mokhtar > Gerrit-Reviewer: Tim Armstrong > Gerrit-HasComments: No >
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Michael Ho has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: Sorry, haven't caught up with the review yet. Is there going to be a message in the profile indicating the reason for codegen to be disabled ? This seems rather important for supportability. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
Lars Volker has uploaded a new patch set (#7). Change subject: IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs .. IMPALA-5427: Fix race between CRS::UpdateQueryStatus() and beeswax RPCs There was a race between ClientRequestState::UpdateQueryStatus() and the beeswax get_state()/get_log() RPCs leading to the rare situation that a query would abort with an error, but the error message would be empty. The fix is to take the ClientRequestState lock in the beeswax RPCs before obtaining the status. To test this I ran test_corrupt_files in a loop for a day. Without this fix, it would usually fail within a few hours. I changed the test to allow running it in parallel like so: @pytest.mark.parametrize('multiplier', xrange(32)) def test_corrupt_files(self, vector, multiplier): Then I ran it in a loop like so: i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test \ tests/query_test/test_scanners.py::TestParquet::test_corrupt_files \ --exploration_strategy=exhaustive -n8; done Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 --- M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc 2 files changed, 24 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7155/7 -- To view, visit http://gerrit.cloudera.org:8080/7155 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4494fe3f933cc23841db0e7da407eec5650f2b5 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5281: stress test: introduce stricter pass guidelines
Matthew Mulder has posted comments on this change. Change subject: IMPALA-5281: stress test: introduce stricter pass guidelines .. Patch Set 1: Code-Review+1 I reviewed the code and performed a test run, but I'm relying on Michael's testing of changing the result hashes to produce incorrect results. -- To view, visit http://gerrit.cloudera.org:8080/7282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f2174a527193ae01be45b8ed56315c465883346 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Matthew Mulder Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/808/ -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 10: Code-Review+2 Update TestStatsExtrapolation -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7153 to look at the new patch set (#10). Change subject: IMPALA-5483: Automatically disable codegen for small queries .. IMPALA-5483: Automatically disable codegen for small queries This is similar to the single-node execution optimisation, but applies to slightly larger queries that should run in a distributed manner but won't benefit from codegen. This adds a new query option disable_codegen_rows_threshold that defaults to 50,000. If fewer than this number of rows are processed by a plan node per impalad, the cost of codegen almost certainly outweighs the benefit. Using rows processed as a threshold is justified by a simple model that assumes the cost of codegen and execution per row for the same operation are proportional. E.g. if x is the complexity of the operation, n is the number of rows processed, C is a constant factor giving the cost of codegen and Ec/Ei are constant factor giving the cost of codegen'd and interpreted execution and d, then the cost of the codegen'd operator is C * x + Ec * x * n and the cost of the interpreted operator is Ei * x * n. Rearranging means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that (at least with the simplified model) it makes sense to choose interpretation or codegen based on a constant threshold. The model also implies that it is somewhat safer to choose codegen because the additional cost of codegen is O(1) but the additional cost of interpretation is O(n). I ran some experiments with TPC-H Q1, varying the input table size, to determine what the cut-over point where codegen was beneficial was. The cutover was around 150k rows per node for both text and parquet. At 50k rows per node disabling codegen was very beneficial - around 0.12s versus 0.24s. To be somewhat conservative I set the default threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the cutover tends to be higher because there are plan nodes that process many fewer than the max rows. Fix a couple of minor issues in the frontend - the numNodes_ calculation could return 0 for Kudu, and the single node optimization didn't handle the case where for a scan node with conjuncts, a limit and missing stats correctly (it considered the estimate still valid.) Testing: Updated e2e tests that set disable_codegen to set disable_codegen_rows_threshold to 0, so that those tests run both with and without codegen still. Added an e2e test to make sure that the optimisation is applied in the backend. Added planner tests for various cases where codegen should and shouldn't be disabled. Perf: Added a targeted perf test for a join+agg over a small input, which benefits from this change. Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test A testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test M tests/common/impala_test_suite.py M tests/common/test_dimensions.py A tests/query_test/test_codegen.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_udfs.py 21 files changed, 461 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/10 -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] Bump Kudu version to bbed78c
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/416/ -- To view, visit http://gerrit.cloudera.org:8080/7314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to bbed78c
Thomas Tauber-Marshall has submitted this change and it was merged. Change subject: Bump Kudu version to bbed78c .. Bump Kudu version to bbed78c Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matthew Jacobs: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/7314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[native-toolchain-CR] Bump Kudu version to bbed78c
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5588: Reduce the frequency of fault injection
Michael Brown has posted comments on this change. Change subject: IMPALA-5588: Reduce the frequency of fault injection .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to bbed78c
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to bbed78c .. Patch Set 1: Is this ready to go in? Impala side commit? -- To view, visit http://gerrit.cloudera.org:8080/7314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1f3ab53ba80c8833407b5e615e6bb92b2160d74 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate .. IMPALA-5280: Coalesce chains of OR conditions to an IN predicate This change introduces a new rule to merge disjunct equality predicates into an IN predicate. As with every rule being applied bottom up, the rule merges the leaf OR predicates into an in predicate and subsequently merges the OR predicate to the existing IN predicate It will also merge two compatible IN predicates into a single IN predicate. Patch also addresses review comments to normalize the binary predicates and testcases for the same. binary predicates of the form constant non constant are normalized to non constant constant Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2 Reviewed-on: http://gerrit.cloudera.org:8080/7110 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java A fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 284 insertions(+), 49 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: sandeep akinapelli Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: sandeep akinapelli
[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: sandeep akinapelli Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: sandeep akinapelli Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has posted comments on this change. Change subject: IMPALA-5582: Store sentry privileges in lower case .. Patch Set 2: Tested this manually. Will try to add some test cases. -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-5582: Store sentry privileges in lower case .. IMPALA-5582: Store sentry privileges in lower case Privileges granted to a db whose name contains upper case characters can disappear after a few seconds. A privilege is inserted into the catalogObjectCache using a key that uses the db name. The key gets converted to lower case before inserting. The sentryProxy thread on the other hand returns the db name in lower case. When the catalogObjectCache gets updated and the old catalog object is removed from the cache it ends up deleting the new object since they both use the same key. This change stores the privileges in lower case which does not trigger these chain on events on a sentryProxy thread update. Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 --- M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/2 -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke
[Impala-ASF-CR] IMPALA-5582: Store sentry privileges in lower case
anujphadke has uploaded a new change for review. http://gerrit.cloudera.org:8080/7332 Change subject: IMPALA-5582: Store sentry privileges in lower case .. IMPALA-5582: Store sentry privileges in lower case Privileges granted to a db whose name contains upper case characters can disappear after a few seconds. A privilege is inserted into the catalogObjectCache using a key that uses the db name. The key gets converted to lower case before inserting. The sentryProxy thread on the other hand returns the db name in lower case. When the catalogObjectCache gets updated and the old catalog object is removed from the cache it ends up deleting the new object since they both use the same key. This change stores the privileges in lower case which does not trigger these chain on events on a sentryProxy thread update. Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 --- M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7332/1 -- To view, visit http://gerrit.cloudera.org:8080/7332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-5547: Rework FK/PK join detection. .. IMPALA-5547: Rework FK/PK join detection. Reworks the FK/PK join detection logic to: - more accurately recognize many-to-many joins - avoid dim/dim joins for multi-column PKs The new detection logic maintains our existing philosophy of generally assuming a FK/PK join, unless there is strong evidence to the contrary, as follows. For each set of simple equi-join conjuncts between two tables, we compute the joint NDV of the right-hand side columns by multiplication, and if the joint NDV is significantly smaller than the right-hand side row count, then we are fairly confident that the right-hand side is not a PK. Otherwise, we assume the set of conjuncts could represent a FK/PK relationship. Extends the explain plan to include the outcome of the FK/PK detection at EXPLAIN_LEVEL > STANDARD. Performance testing: 1. Full TPC-DS run on 10TB: - Q10 improved by >100x - Q72 improved by >25x - Q17,Q26,Q29 improved by 2x - Q64 regressed by 10x - Total runtime: Improved by 2x - Geomean: Minor improvement The regression of Q64 is understood and we will try to address it in follow-on changes. The previous plan was better by accident and not because of superior logic. 2. Nightly TPC-H and TPC-DS runs: - No perf differences Testing: - The existing planner test cover the changes. - Code/hdfs run passed. Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 --- M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 17 files changed, 1,485 insertions(+), 1,351 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/2 -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Alex Behm has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 1: (20 comments) I still need to add the targeted planner tests and adjust new tests after the rebase. In the meantime, you can continue looking at the code changes. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) { : output.append("assumed fk/pk"); > I haven't read the entire patch yet, so I am curious what this case represe In this case we have no equi-join conjuncts that we can reason about, so we are optimistic and assume fk/pk with a join selectivity of 1. We cannot reason about an equi-join conjunct if the underlying table/columns have no stats or if the conjunct involves non-trivial expressions (anything that is not a SlotRef). PS1, Line 178: for (int j = 0; j < slots.size(); ++j) { : output.append(slots.get(j).toString()); : if (j + 1 != slots.size()) output.append(", "); : } > Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots) Better. Done. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS1, Line 37: import org.apache.kudu.client.shaded.com.google.common.collect.Maps; > ? Sigh. Fixed. Line 91: protected List> fkPkEqJoinConjuncts_; > Class member is only used in one function in the base class. Maybe documen Added comment. Not sure it's worth protecting against accidental modifications in inheriting classes. I think it's fair to assume that subclasses should treat protected members carefully. Subclasses can already muck with pretty much anything here, and adding getters that wrap members with immutables seems cumbersome, and generates objects "unnecessarily". Line 253: return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, rhsCard); > This computes estimates based on both PK conjuncts and non-PK conjuncts usi 1. Changed the code to only pass those FK/PK conjuncts because that makes the behavior and code clearer. I'm not sure I follow your described scenario. If the RHS is very selective, then it's correct to apply that to the join cardinality - if we believe that FK/PK is indeed the case. Are you concerned that our FK/PK assumption could be wrong and we might underestimate the join cardinality 2. Good idea. A flattened, ordered list makes things simpler in various places. Done. PS1, Line 271: scanSlotsByTids > nit: Each pair represents two joined tables, no? So, maybe rename this to ' Done Line 275: for (List fkPkCandi: scanSlotsByTids.values()) { > fkPkCandi - this could use a better name, it isn't clear to me what this is I renamed it to fkPkCandidate, but I doubt that's what you had in mind. Can you help me come up with a better name? This thing is a list of equi-join conjuncts between the same two tables. These conjuncts could represent a single- or multi-column FK/PK join condition. PS1, Line 281: numRows > rhsTableCardinality? Used rhsNumRows because we typically use 'numRows' to indicate values that come directly from stats, although technically table cardinality is also correct of course. Line 290: > nit: extra line Done PS1, Line 294: conjuncts > conjunct slots Done PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin()); > This private function is only called in L253? I think you can remove this c Done PS1, Line 312: lhsNdv > Can this ever be 0? Probably. Better be defensive. Fixed here and elsewhere. PS1, Line 313: rhsNumRows > Same here, can this be 0? Probably. Better be defensive. Fixed here and elsewhere. PS1, Line 318: result = Math.min(result, joinCard); > That part I think is very important and I am not sure it is explained anywh This part applies to both methods (generic and FK/PK). I expanded the comment on getJoinCardinality() to explain why we take the min(). We should certainly try (in a subsequent patch) whether estimating the joint selectivity of multiple predicates with exponential backoff or similar works well. PS1, Line 329: conjuncts > conjunct slots Done PS1, Line 341: slots.lhs().getStats().getNumDistinctValues(); > nit: these are used in couple of places. Since you already have the EqJoinC Done PS1, Line 343: slots.lhs().getParent().getTable().getNumRows(); > same comment as above. Done PS1, Line 373: Preconditions > I don't think these checks are useful. This private constructor is only cal Done PS1, Line 382: . > nit: remove Done PS1, Line 399: tupleDesc > nit: inline Removed tupleDesc instead, otherwise the if condition below becomes multi-line