[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 7 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 06:13:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. IMPALA-11945: Fix Flaky Test in JwtHttpTest The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been flaky in the impala-asf-master-core Jenkins build. This build was executed twice with this test passing in one run and failing in the other. This test and another test named testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same pattern. They attempt to start up a cluster with incorrectly configured Impala daemons. Then, the Impala coordinator's log file is searched to ensure the coordinator failed to start up for the expected reason. Both these tests were introduced by IMPALA-11922. The theory for the test flakiness is that the Impala coordinator logs did not finish writing to disk before they were checked for the expected startup failure message. The evidence backing this theory is that all expected log messages were present in the log file except for the final log line containing the error message. Three changes were made to fix the flakiness 1. The tests where log files are search for startup failure messages now use a single coordinator. Only a single coordinator is needed thus this change eliminates potential issues caused by multiple coordinators. 2. The tests sleep 5 seconds before searching the log files to give time for the logs to be fully written to disk. 3. Log buffering in the Impala daemons was turned off by setting the logbuflevel startup flag to -1 which turns off in-memory log buffering and writes the logs directly to disk. Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Reviewed-on: http://gerrit.cloudera.org:8080/19536 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java 1 file changed, 65 insertions(+), 18 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 8 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 ) Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@300 PS2, Line 300: HashMap I think we need ConcurrentHashMap since this will be updated in multi-threads. http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2516 PS1, Line 2516: > Yeah, the event time is generated by HMS. This variable stores the latest t Sorry, what I mean is clock skew between nodes. It's possible that the HMS and catalogd nodes have several milliseconds difference. If any of them have connection issues with the ntp server, the time difference could be even larger. http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6362 PS1, Line 6362: } > Yeah, I did cover the normal refresh cases as well. Did I miss anything? Oh, I just realized the map is updated in fireReloadEventHelper. Can we update the method name to reflect this, e.g. fireReloadEventAndUpdateRefreshMap? http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6423 PS1, Line 6423: //Update the entries in the refresh metadata map nit: need space after '//' http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6434 PS1, Line 6434: //update all the partitions values : for(String partitionName: nit: need spaces after '//', 'for' and around ':' http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4513 PS2, Line 4513: catalog_.setRefreshMetadataMap(refreshMetadataMap); It seems we don't need to set the map back. It's updated in place. http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6398 PS2, Line 6398: catalog_.setRefreshMetadataMap(refreshMetadataMap); I think we can simply set an empty map and don't need to get and clear the current map. Or if we get and clear the current map, we don't need to set it back. http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6425 PS2, Line 6425: new HashMap<>(catalog_.getRefreshMetadataMap()); Why do we need to create a new map here? Can we directly modify the map corresponding to the table? http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6444 PS2, Line 6444: catalog_.setRefreshMetadataMap(refreshMetadataMap); This might have concurrent issue when there are concurrent refreshes on different tables. -- To view, visit http://gerrit.cloudera.org:8080/19484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574 Gerrit-Change-Number: 19484 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Sat, 25 Feb 2023 05:29:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19535 ) Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12445/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 25 Feb 2023 05:14:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19535 ) Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. Patch Set 2: (2 comments) Thanks for your quick review, Daniel! http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@75 PS1, Line 75: } > Shouldn't this also be IR_ALWAYS_INLINE? It calls StringFunctions::Substrin I realized that we don't need this annotation in functions that have registered their symbols in FE, i.e. builtin functions. The reason is that LlvmCodeGen::InlineConstFnAttrs() is invoked in ScalarFnCall::GetCodegendComputeFnImpl() which will recursively invoke the same method in children: https://github.com/apache/impala/blob/f54b3c37577cabad396c3eee7d18a1a6e4eea9f1/be/src/exprs/scalar-fn-call.cc#L335 The calls in children will invoke InlineConstFnAttrs() for themselves. We just need the IR_ALWAYS_INLINE annotation for functions called by builtin functions. If the function itself is a builtin function, it needs the annotation as well if it's invoked by other builtin functions. I also verified that the following query doesn't have performance regression: select count(*) from tpch100_parquet.lineitem where instr(l_comment, 'egular courts above the', 1, 1) > 0; while the following query has perf regression: select count(*) from tpch100_parquet.lineitem where instr(l_comment, 'egular courts above the') > 0; The first one uses the 4 arguments INSTR function directly so it's ok. The second one uses the 2 arguments INSTR function which is a wrapper of the 4 arguments INSTR function. The callee function is not inlined so causes the regression. http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@850 PS1, Line 850: } > Shouldn't these Instr overloads, Locate and LocatePos also be IR_ALWAYS_INL Replied above. -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 25 Feb 2023 04:54:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19535 to look at the new patch set (#2). Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE String functions that have both UTF-8 and the traditional ASCII behaviors have checks for the UTF8_MODE query option. The check is intended to be replaced with constants during codegen in LlvmCodeGen::InlineConstFnAttrs(). However, as mentioned in the method comment, InlineConstFnAttrs() only replaces call instructions inside the current function. To replace the call to FunctionContextImpl::GetConstFnAttr() inside the callee functions, we have to inline the callee functions (by annotating them with IR_ALWAYS_INLINE). This patch annotates UTF-8 related string functions with IR_ALWAYS_INLINE to make sure the checks on UTF8_MODE are all replaced in codegen. Note that builtin functions don't need the annotation if they are not invoked by other builtin functions, because InlineConstFnAttrs() will be invoked recursively in the expression tree. See ScalarFnCall::GetCodegendComputeFnImpl(). Perf tests: Ran PERF_STRING queries in targeted-perf (scale factor 100) on parquet format. Saw significant improvements: +-+--++-++ | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | +-+--++-++ | PERF_STRING-Q7 | parquet/none | 10.80 | 10.93 | -1.13% | | PERF_STRING-Q6 | parquet/none | 12.25 | 12.55 | -2.37% | | PERF_STRING-Q5 | parquet/none | 5.87 | 6.01| -2.33% | | PERF_STRING-Q3 | parquet/none | 5.02 | 5.26| -4.47% | | PERF_STRING-Q2 | parquet/none | 4.49 | 4.73| -4.88% | | PERF_STRING-Q4 | parquet/none | 4.99 | 5.28| I -5.48% | | PERF_STRING-Q1 | parquet/none | 4.02 | 4.29| I -6.34% | +-+--++-++ | PERF_STRING-Q13 | parquet/none | 9.90 | 11.48 | I -13.72% | | PERF_STRING-Q9 | parquet/none | 10.01 | 11.64 | I -13.97% | | PERF_STRING-Q11 | parquet/none | 9.83 | 11.56 | I -14.97% | | PERF_STRING-Q10 | parquet/none | 6.62 | 8.07| I -17.89% | | PERF_STRING-Q12 | parquet/none | 6.63 | 8.25| I -19.61% | | PERF_STRING-Q8 | parquet/none | 5.03 | 6.45| I -22.09% | +-+--++-++ Note that Q8-Q13 are new queries added by this patch to reveal the performance difference. Ran the same queries comparing the branch of reverting the UTF-8 changes (IMPALA-2019). Found no regressions anymore. Even see improvement in one query (Q13): +-+--++-++ | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | +-+--++-++ | PERF_STRING-Q11 | parquet/none | 9.85 | 9.56| +3.04% | | PERF_STRING-Q7 | parquet/none | 10.76 | 10.59 | +1.57% | | PERF_STRING-Q1 | parquet/none | 4.04 | 4.00| +0.83% | | PERF_STRING-Q4 | parquet/none | 5.02 | 5.00| +0.51% | | PERF_STRING-Q2 | parquet/none | 4.53 | 4.51| +0.32% | | PERF_STRING-Q5 | parquet/none | 5.81 | 5.81| +0.02% | | PERF_STRING-Q8 | parquet/none | 5.02 | 5.04| -0.31% | | PERF_STRING-Q12 | parquet/none | 6.65 | 6.68| -0.43% | | PERF_STRING-Q3 | parquet/none | 5.02 | 5.05| -0.53% | | PERF_STRING-Q6 | parquet/none | 12.23 | 12.33 | -0.84% | | PERF_STRING-Q10 | parquet/none | 6.68 | 6.74| -0.87% | | PERF_STRING-Q9 | parquet/none | 10.01 | 10.41 | -3.86% | | PERF_STRING-Q13 | parquet/none | 9.92 | 10.54 | I -5.85% | +-+--++-++ Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a --- M be/src/exprs/mask-functions-ir.cc M be/src/exprs/string-functions-ir.cc M testdata/workloads/targeted-perf/queries/string.test 3 files changed, 117 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/19535/2 -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18023 ) Change subject: IMPALA-7942 (part 2): Add query hints for predicate selectivities .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12444/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Sat, 25 Feb 2023 04:22:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18023 ) Change subject: IMPALA-7942 (part 2): Add query hints for predicate selectivities .. Patch Set 13: (29 comments) > Can you please provide feedback on all comments? > > For example, for the following comment, a feedback can be DONE, a > reply etc. > > Commit Message > Line 14: > TABLE_NUM_ROWS? > > The feedbacks are useful to judge the rework. Thanks! Ok, Qifan, thanks for reply. I've already answer all comments. http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/Predicate.java File fe/src/main/java/org/apache/impala/analysis/Predicate.java: http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/Predicate.java@78 PS9, Line 78: ti > nit. should be a double value in (0, 1]. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/TableRef.java@547 PS9, Line 547: return; > The above hints are all limited to hdfs tables since they are related to hd Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/TableRef.java@554 PS9, Line 554: } > nit. for this HDFS table reference. On paper, one can assign different # ro Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/analysis/TableRef.java@555 PS9, Line 555: > It seems we need to handle NumberFormatException thrown from this method. F Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/planner/ScanNode.java@81 PS9, Line 81: H > nit. "." Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/main/java/org/apache/impala/planner/ScanNode.java@82 PS9, Line 82: ng tableNumRowsHint_ = -1; : > I wonder if this is still correct. I thought the hint can be used to overri Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5053 PS9, Line 5053: void testCreatePartitio > This error may not be user friendly for super long SQL query. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5056 PS9, Line 5056: "PARTITIONED BY SPEC (BUC > same comment as above. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5060 PS9, Line 5060: "STORED AS ICEBERG" + tb > same comment as above. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5064 PS9, Line 5064: int, > nit. may add a qualifier "non-HDFS" before 'table' to make it clear. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5071 PS9, Line 5071: "PARTITIONED BY SPEC (BU > See the comment on TABLE_NUM_ROWS for "syntax error in line 1". Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5074 PS9, Line 5074: "PARTITIONED BY SPEC (TRUNCATE(0, p1), DAY(p2)) STORED AS ICEBERG" + > Isn't this supported now in patch set 9? Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5085 PS9, Line 5085: Legal hint return correct > is this a mistake? 0.1 is a perfect selectivity value. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5087 PS9, Line 5087: zesOk("select * from tp > Same as above. 0 is okay. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5093 PS9, Line 5093: > should be [0,1], right? Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5098 PS9, Line 5098: gal > See the comment about. Should be allowed. Done http://gerrit.cloudera.org:8080/#/c/18023/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5102 PS9, Line 5102: : // Multiple illegal hints wil > From the updated parser, it seems this hint should be accepted in that the Done http://gerrit.cloudera.org:8080/#/c/18023/9/testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test File
[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities
wangsheng has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18023 ) Change subject: IMPALA-7942 (part 2): Add query hints for predicate selectivities .. IMPALA-7942 (part 2): Add query hints for predicate selectivities Currently, Impala only uses simple estimation to compute selectivity for some predicates, and this may lead to worse query plan due to CBO. Hence, we add new hints to reduce such errors. Maybe in the future, we can use histograms to get more precise query plan. This patch adds another query hints: 'SELECTIVITY', we can use this hint to original selectivity computing. Format like this: select col from t where (a=1) /* +SELECTIVITY(0.5) */; Besides, this hint is also valid for compound predicate like this: select col from t where (a=1 and b=2) /* +SELECTIVITY(0.5) */; But pay attention, if we want to use 'SELECTIVITY' hint for predicate, we need to wrap the predicate by braket, even for single binary predicate. Testing: - Added new fe tests in 'PlannerTest' - Added new fe tests in 'AnalyzeStmtsTest' for negative cases Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test 12 files changed, 375 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/18023/13 -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12443/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 6 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 01:25:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 7 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 01:06:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9072/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 7 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 01:06:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 6: Code-Review+2 Looks good, thanks! Carry Andrew's +1. -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 6 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 01:05:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. IMPALA-11945: Fix Flaky Test in JwtHttpTest The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been flaky in the impala-asf-master-core Jenkins build. This build was executed twice with this test passing in one run and failing in the other. This test and another test named testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same pattern. They attempt to start up a cluster with incorrectly configured Impala daemons. Then, the Impala coordinator's log file is searched to ensure the coordinator failed to start up for the expected reason. Both these tests were introduced by IMPALA-11922. The theory for the test flakiness is that the Impala coordinator logs did not finish writing to disk before they were checked for the expected startup failure message. The evidence backing this theory is that all expected log messages were present in the log file except for the final log line containing the error message. Three changes were made to fix the flakiness 1. The tests where log files are search for startup failure messages now use a single coordinator. Only a single coordinator is needed thus this change eliminates potential issues caused by multiple coordinators. 2. The tests sleep 5 seconds before searching the log files to give time for the logs to be fully written to disk. 3. Log buffering in the Impala daemons was turned off by setting the logbuflevel startup flag to -1 which turns off in-memory log buffering and writes the logs directly to disk. Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 --- M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java 1 file changed, 65 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/19536/6 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 6 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/19536/5/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/5/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@443 PS5, Line 443: List logLines = null; : Matcher> m = hasItem(containsString(expectedErrString)); : : // writing logs to disk may take some time, try a few times to search for the : // expected error in the log : for (int i=0; i<10; i++) { : logLines = Files.readAllLines(logDir.resolve("impalad.ERROR")); : if (m.matches(logLines)) { : break; : } : Thread.sleep(250); : } : : // runs the matcher one more time to ensure a descriptive failure message is : // generated if the assert fails > nit: This code is duplicated in two places. Maybe put this into its own met Done -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 01:04:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-11947: Bump GBN to get Iceberg change #6074
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19538 ) Change subject: WIP IMPALA-11947: Bump GBN to get Iceberg change #6074 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12442/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c7a3564c51dcd5216b811b66ad45f0159fec817 Gerrit-Change-Number: 19538 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 25 Feb 2023 00:38:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12441/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 00:28:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 5: Code-Review+1 LGTM, I'll let Riza take this to +2 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 00:22:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 25 Feb 2023 00:19:16 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-11947: Bump GBN to get Iceberg change #6074
Andrew Sherman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19538 Change subject: WIP IMPALA-11947: Bump GBN to get Iceberg change #6074 .. WIP IMPALA-11947: Bump GBN to get Iceberg change #6074 We need Iceberg change #6074 (Allow using SnapshotManager in Transaction) to fix IMPALA-11482. Bump CDP_BUILD_NUMBER (GBN) to 38235009 Bump various CDP version numbers to be based on 7.2.17.0-127 Bump Iceberg version to 1.1.0 TESTING: I am running exhaustive tests... Change-Id: I3c7a3564c51dcd5216b811b66ad45f0159fec817 --- M bin/impala-config.sh 1 file changed, 11 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/19538/1 -- To view, visit http://gerrit.cloudera.org:8080/19538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3c7a3564c51dcd5216b811b66ad45f0159fec817 Gerrit-Change-Number: 19538 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 5: (1 comment) Thanks Jason, just have 1 more nit. http://gerrit.cloudera.org:8080/#/c/19536/5/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/5/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@443 PS5, Line 443: List logLines = null; : Matcher> m = hasItem(containsString(expectedErrString)); : : // writing logs to disk may take some time, try a few times to search for the : // expected error in the log : for (int i=0; i<10; i++) { : logLines = Files.readAllLines(logDir.resolve("impalad.ERROR")); : if (m.matches(logLines)) { : break; : } : Thread.sleep(250); : } : : // runs the matcher one more time to ensure a descriptive failure message is : // generated if the assert fails nit: This code is duplicated in two places. Maybe put this into its own method? -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 00:16:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@416 PS4, Line 416: testJwtAuthWithUntrustedJwksHttpsUrl > Can you try loop this test in your local machine? just to make sure it is n good idea, I ran the test 100 times and it did not fail once -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 00:09:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/19536/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19536/4//COMMIT_MSG@9 PS4, Line 9: The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl > nit: mention the IMPALA-11922 in commit message. done http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@442 PS4, Line 442: // check in the impalad logs that the server startup failed for the expected reason > Nit: We could avoid the sleep by looping a few times retrying the readAllLi fixing to eliminate the 5 second sleep and establish a better pattern -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 25 Feb 2023 00:08:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. IMPALA-11945: Fix Flaky Test in JwtHttpTest The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been flaky in the impala-asf-master-core Jenkins build. This build was executed twice with this test passing in one run and failing in the other. This test and another test named testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same pattern. They attempt to start up a cluster with incorrectly configured Impala daemons. Then, the Impala coordinator's log file is searched to ensure the coordinator failed to start up for the expected reason. Both these tests were introduced by IMPALA-11922. The theory for the test flakiness is that the Impala coordinator logs did not finish writing to disk before they were checked for the expected startup failure message. The evidence backing this theory is that all expected log messages were present in the log file except for the final log line containing the error message. Three changes were made to fix the flakiness 1. The tests where log files are search for startup failure messages now use a single coordinator. Only a single coordinator is needed thus this change eliminates potential issues caused by multiple coordinators. 2. The tests sleep 5 seconds before searching the log files to give time for the logs to be fully written to disk. 3. Log buffering in the Impala daemons was turned off by setting the logbuflevel startup flag to -1 which turns off in-memory log buffering and writes the logs directly to disk. Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 --- M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java 1 file changed, 61 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/19536/5 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 5 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/19536/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19536/4//COMMIT_MSG@9 PS4, Line 9: The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl nit: mention the IMPALA-11922 in commit message. In case we backport IMPALA-11922 somewhere else, we'll remember to include IMPALA-11945 as well. http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@416 PS4, Line 416: testJwtAuthWithUntrustedJwksHttpsUrl Can you try loop this test in your local machine? just to make sure it is not flaky anymore. Maybe like this from $IMPALA_HOME: (pushd fe && for i in {1..100}; do mvn -fae test -Dtest=JwtHttpTest#testJwtAuthWithUntrustedJwksHttpsUrl; done) -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 24 Feb 2023 22:27:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 4: (1 comment) I have a suggestion which you can push back on if you like http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java: http://gerrit.cloudera.org:8080/#/c/19536/4/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@442 PS4, Line 442: Thread.sleep(5000); Nit: We could avoid the sleep by looping a few times retrying the readAllLines+containsString with a short sleep after each try. Impala has a LOT of tests and we want to try and keep them running as fast as possible. So 10 extra seconds of sleep may seem like not much, but it would be better to establish a pattern that doesn't involve sleeping in case other people copy and paste the test, -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 22:13:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19537 ) Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12440/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 1 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 22:13:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend
John Sherman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19537 Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend .. IMPALA-11946: Add Thrift HTTP support for external frontend - Add enable_external_fe_http flag that defaults to false - When true the external frontend service (external_fe_port) will expect clients to use http transport. - When false the external frontend service will expect binary transport. - Add tests for basic external frontend functionality - Add test to ensure the non-external frontend services do not expose the ExecutePlannedStatement interface. Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 --- M be/src/service/impala-server.cc M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java A fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java 3 files changed, 208 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/19537/1 -- To view, visit http://gerrit.cloudera.org:8080/19537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050 Gerrit-Change-Number: 19537 Gerrit-PatchSet: 1 Gerrit-Owner: John Sherman
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19536 ) Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12439/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 21:54:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11945: Fix Flaky Test in JwtHttpTest
Jason Fehr has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19536 Change subject: IMPALA-11945: Fix Flaky Test in JwtHttpTest .. IMPALA-11945: Fix Flaky Test in JwtHttpTest The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been flaky in the impala-asf-master-core Jenkins build. This build was executed twice with this test passing in one run and failing in the other. This test and another test named testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same pattern. They attempt to start up a cluster with incorrectly configured Impala daemons. Then, the Impala coordinator's log file is searched to ensure the coordinator failed to start up for the expected reason. The theory for the test flakiness is that the Impala coordinator logs did not finish writing to disk before they were checked for the expected startup failure message. The evidence backing this theory is that all expected log messages were present in the log file except for the final log line containing the error message. Three changes were made to fix the flakiness 1. The tests where log files are search for startup failure messages now use a single coordinator. Only a single coordinator is needed thus this change eliminates potential issues caused by multiple coordinators. 2. The tests sleep 5 seconds before searching the log files to give time for the logs to be fully written to disk. 3. Log buffering in the Impala daemons was turned off by setting the logbuflevel startup flag to -1 which turns off in-memory log buffering and writes the logs directly to disk. Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 --- M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java 1 file changed, 37 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/19536/4 -- To view, visit http://gerrit.cloudera.org:8080/19536 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67 Gerrit-Change-Number: 19536 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr
[native-toolchain-CR] IMPALA-11944: Add SLES15 support
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19528 ) Change subject: IMPALA-11944: Add SLES15 support .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19528/3/source/kudu/build.sh File source/kudu/build.sh: http://gerrit.cloudera.org:8080/#/c/19528/3/source/kudu/build.sh@125 PS3, Line 125: # Set JAVA_HOME to the system java. Kudu has a bunch of heuristics for trying to I can remove this with the JDK 8 we ended up with. http://gerrit.cloudera.org:8080/#/c/19528/3/source/python/build.sh File source/python/build.sh: http://gerrit.cloudera.org:8080/#/c/19528/3/source/python/build.sh@58 PS3, Line 58: wrap $LOCAL_INSTALL/bin/python -c 'import bz2; import readline; from urllib2 import HTTPSHandler; from httplib import HTTPConnection' This and the thrift/build.sh could be handled in a different patch. -- To view, visit http://gerrit.cloudera.org:8080/19528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie27521cdbec2f7e966e70c2b0a954d391bf1500d Gerrit-Change-Number: 19528 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 24 Feb 2023 21:32:12 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-11944: Add SLES15 support
Hello Laszlo Gaal, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19528 to look at the new patch set (#3). Change subject: IMPALA-11944: Add SLES15 support .. IMPALA-11944: Add SLES15 support Adds support for building with SLES 15 SP4. Adds a container definition for SLES 15 based on public repositories. SLES 15 SP4 does not provide Python 2 or Java 8 JDK, so we use openSUSE Leap 15.4 distribution packages for these. Fixes several invocations of python that unexpectedly used the system python instead of the python version built in the native-toolchain. Change-Id: Ie27521cdbec2f7e966e70c2b0a954d391bf1500d --- M Makefile A docker/sles15.df M in-docker.py M source/kudu/build.sh M source/python/build.sh M source/thrift/build.sh 6 files changed, 73 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/28/19528/3 -- To view, visit http://gerrit.cloudera.org:8080/19528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie27521cdbec2f7e966e70c2b0a954d391bf1500d Gerrit-Change-Number: 19528 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/schedule-state.h File be/src/scheduling/schedule-state.h: http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/schedule-state.h@288 PS2, Line 288: The assumption is that all : /// executors have the same memory limit for admission and all the coordinators have the : /// same memory limit for admission. > Is the assumption working if one node is set as both executor and coordinat Thanks Wenzhe, that's a good point. I think we could have couple of cases: - Impala with dedicated coordinators and executors - Impala with all nodes acting as both coordinators and executors - Impala with some nodes acting as dedicated coordinators and dedicated executors while others acting as both I think the third case is a bit tricky to handle. Let me think about this a bit more and rework the patch. -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 21:00:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 ) Change subject: IMPALA-11604 Planner changes for CPU usage .. Patch Set 52: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12438/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 52 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 20:57:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 ) Change subject: IMPALA-11604 Planner changes for CPU usage .. Patch Set 52: (25 comments) ps52 is a rebase and address some comments in ps51. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151 PS51, Line 151: > nit plan fragment, which is blocking since it has 3 blocking PlanNode: Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171 PS51, Line 171: : In bottom-up direction, there exist four segments in F03: : Blocking segment 1: (11:EXCHANGE, 12:AGGREGATE) : Blocking segment 2: 06 > indentation and with some additional info as follows. Done. I rather keep calling segment 4 as non-blocking segment since only JoinBuildSink is considered as blocking DataSink. This has correlation with definition of blocking fragment. All fragment has DataSink. But fragment without blocking PlanNode nor blocking DataSink is not blocking fragment. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177 PS51, Line 177: : Therefore we have: : PC(segment 1) = 426337+34548320 : PC(segment 2) = > indentation Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182 PS51, Line 182: PC(segment 4) = 22 > nit a Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183 PS51, Line 183: : These per-segment costs stored in a SegmentCost tree rooted at : PlanFragment.rootSegment_, and ar > nit. , and are [34974657, 2159270, 23752870, 22] respectively after the pos Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189 PS51, Line 189: PlanFragment.collectSegmentCostHelper(). > I think "Output ProcessingCost" should be really called "Total Processing c Removed this paragraph and refer the cost directly as "the last segment's ProcessingCost". I only consider the last segment rather than the total over all segment in fragment to anticipate for burst exchange scenario. For example, fragment that only do aggregate may spend long time during aggregation. But when it is ready to send rows upward, the receiver fragment above it should have similar EDoP to keep-up with the sender. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196 PS51, Line 196: hat fragment by compa > nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199 PS51, Line 199: UnionNode, or : ScanNode will > the costing algorithm attempts to adjust the number of Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201 PS51, Line 201: > see the previous comment. Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205 PS51, Line 205: > Assume that segments are modeled as a list in a plan fragment (true?) in th No, SegmentCost is modelled as tree. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209 PS51, Line 209: in a similar post-order > EDoP Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261 PS51, Line 261: > nit suggest to remove since a query plan with a sink node, which is blockin Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263 PS51, Line 263: f a query or the query itself. Each blocking : subtree will > the intermediate or leaf nodes. Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268 PS51, Line 268: ample is [4, 4, > By reading this para, it seems CoreCount is a better name. Usually a requi Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269 PS51, Line 269: > nit remove Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286 PS51, Line 286: control the entire com > EDoP Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288 PS51, Line 288: m or not. > nit. remove Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288 PS51, Line 288:Control whether to enable this CPU costing algorithm or not. > set Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292 PS51, Line 292: instances (threads) that > the entire computation of EDoP. Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300 PS51, Line 300: ount of > computing Done http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301 PS51, Line 301: is suggested to keep this to :false until the min_processing_per_thread backend fl > I strongly suggest that we introduce PROCESSING_COST_MIN_THREADS in this pa Ack. Will take a look at this. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@304 PS51, Line 304: : This patch also adds three backend flags to tune the algorithm. : 1. query_cpu_count_divisor :Divide the CPU requirement of a
[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage
Riza Suminto has uploaded a new patch set (#52) to the change originally created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 ) Change subject: IMPALA-11604 Planner changes for CPU usage .. IMPALA-11604 Planner changes for CPU usage This patch augments IMPALA-10992 by establishing an infrastructure to allow the weighted total amount of data to process to be used as a new factor in the definition and selection of an executor group. At the basis of the CPU costing model, we define ProcessingCost as a cost for a distinct PlanNode / DataSink / PlanFragment to process its input rows globally across all of its instances. The costing algorithm then tries to adjust the number of instances for each fragment by considering their production-consumption ratio, and then finally returns a number representing an ideal CPU core count required for a query to run efficiently. A more detailed explanation of the CPU costing algorithm can be found in the four steps below. I. Compute ProcessingCost for each plan node and data sink. ProcessingCost of a PlanNode/DataSink is a weighted amount of data processed by that node/sink. The basic ProcessingCost is computed with a general formula as follows. ProcessingCost is a pair: PC(D, N), where D = I * (C + M) where D is the weighted amount of data processed I is the input cardinality C is the expression evaluation cost per row. Set to total weight of expression evaluation in node/sink. M is a materialization cost per row. Only used by scan and exchange node. Otherwise, 0. N is the number of instances. Default to D / MIN_COST_PER_THREAD (1 million), but is fixed for a certain node/sink and adjustable in step III. In this patch, the weight of each expression evaluation is set to a constant of 1. A description of the computation for each kind of PlanNode/DataSink is given below. 01. AggregationNode: Each AggregateInfo has its C as a sum of grouping expression and aggregate expression and then assigned a single ProcessingCost individually. These ProcessingCosts then summed to be the Aggregation node's ProcessingCost; 02. AnalyticEvalNode: C is the sum of the evaluation costs for analytic functions; 03. CardinalityCheckNode: Use the general formula, I = 1; 04. DataSourceScanNode: Follow the formula from the superclass ScanNode; 05. EmptySetNode: I = 0; 06. ExchangeNode: M = (average serialized row size) / 1024 A modification of the general formula when in broadcast mode: D = D * number of receivers; 07. HashJoinNode: probe cost = PC(I0 * C(equiJoin predicate), N) + PC(output cardinality * C(otherJoin predicate), N) build cost = PC(I1 * C(equi-join predicate), N) With I0 and I1 as input cardinality of the probe and build side accordingly. If the plan node does not have a separate build, ProcessingCost is the sum of probe cost and build cost. Otherwise, ProcessingCost is equal to probeCost. 08. HbaseScanNode, HdfsScanNode, and KuduScanNode: Follow the formula from the superclass ScanNode; 09. Nested loop join node: When the right child is not a SingularRowSrcNode: probe cost = PC(I0 * C(equiJoin predicate), N) + PC(output cardinality * C(otherJoin predicate), N) build cost = PC(I1 * C(equiJoin predicate), N) When the right child is a SingularRowSrcNode: probe cost = PC(I0, N) build cost = PC(I0 * I1, N) With I0 and I1 as input cardinality of the probe and build side accordingly. If the plan node does not have a separate build, ProcessingCost is the sum of probe cost and build cost. Otherwise, ProcessingCost is equal to probeCost. 10. ScanNode: M = (average row size) / 1024; 11. SelectNode: Use the general formula; 12. SingularRowSrcNode: Since the node is involved once per input in nested loop join, the contribution of this node is computed in nested loop join; 13. SortNode: C is the evaluation cost for the sort expression; 14. SubplanNode: C is 1. I is the multiplication of the cardinality of the left and the right child; 15. Union node: C is the cost of result expression evaluation from all non-pass-through children; 16. Unnest node: I is the cardinality of the containing SubplanNode and C is 1. 17. DataStreamSink: M = 1 / num rows per batch. 18. JoinBuildSink: ProcessingCost is the build cost of its associated JoinNode. 29. PlanRootSink: If result spooling is enabled, C is the cost of output expression evaluation. Otherwise. ProcessingCost is zero. 20. TableSink: C is the cost of output expression evaluation. TableSink subclasses (including HBaseTableSink, HdfsTableSink, and KuduTableSink) follows the same formula; II. Compute the total ProcessingCost of a
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 20:09:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12437/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 19:26:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/admission-controller-test.cc File be/src/scheduling/admission-controller-test.cc: http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/admission-controller-test.cc@877 PS2, Line 877: set_is_executor(false); If the node is set as executor and coordinator, the assumption mat not work. http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/schedule-state.h File be/src/scheduling/schedule-state.h: http://gerrit.cloudera.org:8080/#/c/19533/2/be/src/scheduling/schedule-state.h@288 PS2, Line 288: The assumption is that all : /// executors have the same memory limit for admission and all the coordinators have the : /// same memory limit for admission. Is the assumption working if one node is set as both executor and coordinator? -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 19:16:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9071/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 19:11:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 19:08:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Joe McDonnell has uploaded a new patch set (#7) to the change originally created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. IMPALA-11624: Bump Impyla dependency to 0.18.0 IMPALA_THRIFT_PY_VERSION is also bumped to 0.16.0p3. As 0.16.0p3 Thrift does not contain Python related patches and Impyla 0.18.0 depends on Thrift 0.16.0, now we are consistently using Thrift 0.16.0 in all Python code. This also bumps the Thrift in the shell's ext-py directory to 0.16.0 (based on the Thrift 0.16.0 pypi tarball with the egg directory removed). Testing: - Ran a GVO job Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 --- M bin/impala-config.sh M bin/rat_exclude_files.txt M infra/python/deps/requirements.txt M shell/.gitignore D shell/ext-py/thrift-0.14.2/CMakeLists.txt D shell/ext-py/thrift-0.14.2/Makefile.am D shell/ext-py/thrift-0.14.2/coding_standards.md D shell/ext-py/thrift-0.14.2/compat/win32/stdint.h D shell/ext-py/thrift-0.14.2/setup.cfg D shell/ext-py/thrift-0.14.2/test/_import_local_thrift.py D shell/ext-py/thrift-0.14.2/test/test_thrift_file/TestServer.thrift D shell/ext-py/thrift-0.14.2/test/thrift_TBinaryProtocol.py D shell/ext-py/thrift-0.14.2/test/thrift_TCompactProtocol.py D shell/ext-py/thrift-0.14.2/test/thrift_TNonblockingServer.py D shell/ext-py/thrift-0.14.2/test/thrift_TZlibTransport.py D shell/ext-py/thrift-0.14.2/test/thrift_json.py D shell/ext-py/thrift-0.14.2/test/thrift_transport.py R shell/ext-py/thrift-0.16.0/MANIFEST.in R shell/ext-py/thrift-0.16.0/README.md A shell/ext-py/thrift-0.16.0/setup.cfg R shell/ext-py/thrift-0.16.0/setup.py R shell/ext-py/thrift-0.16.0/src/TMultiplexedProcessor.py R shell/ext-py/thrift-0.16.0/src/TRecursive.py R shell/ext-py/thrift-0.16.0/src/TSCons.py R shell/ext-py/thrift-0.16.0/src/TSerialization.py R shell/ext-py/thrift-0.16.0/src/TTornado.py R shell/ext-py/thrift-0.16.0/src/Thrift.py R shell/ext-py/thrift-0.16.0/src/__init__.py R shell/ext-py/thrift-0.16.0/src/compat.py R shell/ext-py/thrift-0.16.0/src/ext/binary.cpp R shell/ext-py/thrift-0.16.0/src/ext/binary.h R shell/ext-py/thrift-0.16.0/src/ext/compact.cpp R shell/ext-py/thrift-0.16.0/src/ext/compact.h R shell/ext-py/thrift-0.16.0/src/ext/endian.h R shell/ext-py/thrift-0.16.0/src/ext/module.cpp R shell/ext-py/thrift-0.16.0/src/ext/protocol.h R shell/ext-py/thrift-0.16.0/src/ext/protocol.tcc R shell/ext-py/thrift-0.16.0/src/ext/types.cpp R shell/ext-py/thrift-0.16.0/src/ext/types.h R shell/ext-py/thrift-0.16.0/src/protocol/TBase.py R shell/ext-py/thrift-0.16.0/src/protocol/TBinaryProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/TCompactProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/THeaderProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/TJSONProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/TMultiplexedProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/TProtocol.py R shell/ext-py/thrift-0.16.0/src/protocol/TProtocolDecorator.py R shell/ext-py/thrift-0.16.0/src/protocol/__init__.py R shell/ext-py/thrift-0.16.0/src/server/THttpServer.py R shell/ext-py/thrift-0.16.0/src/server/TNonblockingServer.py R shell/ext-py/thrift-0.16.0/src/server/TProcessPoolServer.py R shell/ext-py/thrift-0.16.0/src/server/TServer.py R shell/ext-py/thrift-0.16.0/src/server/__init__.py R shell/ext-py/thrift-0.16.0/src/transport/THeaderTransport.py R shell/ext-py/thrift-0.16.0/src/transport/THttpClient.py R shell/ext-py/thrift-0.16.0/src/transport/TSSLSocket.py R shell/ext-py/thrift-0.16.0/src/transport/TSocket.py R shell/ext-py/thrift-0.16.0/src/transport/TTransport.py R shell/ext-py/thrift-0.16.0/src/transport/TTwisted.py R shell/ext-py/thrift-0.16.0/src/transport/TZlibTransport.py R shell/ext-py/thrift-0.16.0/src/transport/__init__.py R shell/ext-py/thrift-0.16.0/src/transport/sslcompat.py R shell/ext-py/thrift-0.16.0/test/test_socket.py R shell/ext-py/thrift-0.16.0/test/test_sslsocket.py M shell/packaging/requirements.txt 65 files changed, 48 insertions(+), 1,375 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/18456/7 -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11624: Bump Impyla dependency to 0.18.0
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/18456 ) Change subject: IMPALA-11624: Bump Impyla dependency to 0.18.0 .. Patch Set 6: I'm going to do an upload to this that adds the ext-py changes. -- To view, visit http://gerrit.cloudera.org:8080/18456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7265558b0e07959c606cba73cd251c3edfcb3ed5 Gerrit-Change-Number: 18456 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 19:05:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 5: Code-Review+2 Thanks Riza! -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 17:16:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg IMPALA-11658 implements Iceberg manifest caching for Impala. This patch adds documentation for configuring the cache(s). Testing: - Built docs locally Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Reviewed-on: http://gerrit.cloudera.org:8080/19530 Reviewed-by: Daniel Becker Tested-by: Impala Public Jenkins Reviewed-by: Zoltan Borok-Nagy --- M docs/topics/impala_iceberg.xml 1 file changed, 60 insertions(+), 0 deletions(-) Approvals: Daniel Becker: Looks good to me, but someone else must approve Impala Public Jenkins: Verified Zoltan Borok-Nagy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[native-toolchain-CR] IMPALA-11944: Add SLES15 support
Hello Laszlo Gaal, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19528 to look at the new patch set (#2). Change subject: IMPALA-11944: Add SLES15 support .. IMPALA-11944: Add SLES15 support Adds support for building with SLES15 SP4. Adds a container definition for SLES15 based on public repositories. SLES 15 SP4 does not support Python 2 so we omit that from the build and symlink python to python3 for build dependencies. It also doesn't provide Java 8 packages, so we use Java 11 and explicitly set JAVA_HOME as Kudu's normal heuristics for finding JAVA_HOME didn't work. Fixes several invocations of python that unexpectedly used the system python instead of the python version built in the native-toolchain. Change-Id: Ie27521cdbec2f7e966e70c2b0a954d391bf1500d --- M Makefile M docker/all/assert-dependencies-present.py M docker/all/postinstall.sh A docker/sles15.df M in-docker.py M source/kudu/build.sh M source/python/build.sh M source/thrift/build.sh 8 files changed, 76 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/28/19528/2 -- To view, visit http://gerrit.cloudera.org:8080/19528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie27521cdbec2f7e966e70c2b0a954d391bf1500d Gerrit-Change-Number: 19528 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 5: Code-Review+1 Thanks Riza. -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 17:10:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 5: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/704/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 17:12:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml@617 PS4, Line 617: gd. This f > Maybe we should mention that it is useful for the Coordinators and CatalogD Done http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml@623 PS4, Line 623: 36000 > nit: people might just blindly copying the values from here. So we probably Make sense. Changed this to 360. -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 17:05:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Hello Andrew Sherman, Daniel Becker, Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19530 to look at the new patch set (#5). Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg IMPALA-11658 implements Iceberg manifest caching for Impala. This patch adds documentation for configuring the cache(s). Testing: - Built docs locally Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe --- M docs/topics/impala_iceberg.xml 1 file changed, 60 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19530/5 -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 1: (21 comments) http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@16 PS1, Line 16: filter Nit: filters. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: taking into account Wouldn't "assuming" be better? http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: ) Nit: closing parenthesis should come at the end of the sentence. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: An additional benefit of not down no-op predicates Am I right that here we mean pushing down predicates to the scanner, as opposed to pushing it down to Iceberg? We could make this clearer, perhaps also in the title. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: not down not pushing down http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@28 PS1, Line 28: We could include a Testing section describing what tests have been added/modified. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2014 PS1, Line 2014: getDerivedExplainString You could add a comment describing how this method should be used. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: private List skippedConjuncts; What is the relationship between 'conjuncts_', 'nonIdentityConjuncts_' and 'skippedConjuncts_'? Is 'conjuncts_' the union of the other two? In this case 'skippedConjuncts_' may not need to be stored (or taken in the constructor) separately. Or, if the 'skippedConjuncts_' have already been removed from 'conjuncts_', the comment on L56 has become a bit misleading. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: skippedConjuncts Should end with an underscore: 'skippedConjuncts_'. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: canSkipPushingDownIcebergPredicates Should end with underscore. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: canSkipPushingDownIcebergPredicates Could make it clearer that here we mean pushing predicates down to the scanners, not to Iceberg. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS1, Line 375: conjuncts_.removeAll(impalaIcebergPredicates_); Wouldn't it be possible that some (but not all) conjuncts can be skipped? Can this approach be further refined (in a follow-up commit)? http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@379 PS1, Line 379: private List getSkippedConjuncts() { Nit: if we forgot to call filterConjuncts() but called filterFileDescriptors(), it's possible that 'conjuncts_' would still contain the elements that getSkippedConjuncts() returns. If we had a separate member for 'skippedConjuncts_' that would be returned by getSkippedConjuncts() and would be populated in filterConjuncts(), this anomaly would not be possible. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279 PS1, Line 1279: exercising Nit: excercising. http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@1 PS1, Line 1: in Nit: no need for 'in'. http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@2 PS1, Line 2: are Nit: out.
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 5: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/704/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 17:04:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 ) Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java: http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@26 PS8, Line 26: * values that map to StringValue in the BE. Can you mention the caching behavior? http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@59 PS8, Line 59: ; optional: bufferCapacity_ could be set here with getLengthFromNativeHeap() http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@84 PS8, Line 84: public int getCapacity() { : return bufferCapacity_; : } What happens if this is called before initialize()? http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@94 PS8, Line 94: Arrays.copyOf(array_, newCap); Can't this be called while array_ is still EMPTY_ARRAY? http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104 PS8, Line 104: initalize(); optional: returning getLengthFromNativeHeap() would allow optimizing the case when the length is accessed, but the data is not http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@116 PS8, Line 116: if (EMPTY_ARRAY == array_) { : initalize(); : } This could be skipped in this case, as the original array won't be used. http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test: http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@358 PS8, Line 358: select increment( cast("a" as binary)); nit: extra space http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test File testdata/workloads/functional-query/queries/QueryTest/java-udf.test: http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@371 PS8, Line 371: select increment("a"); Can you add a test with empty string? The UDF seems to handle this case and it would be nice to test if the executor works in this case. (same for the generic case) http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@378 PS8, Line 378: select increment( cast("a" as binary)); nit: extra space -- To view, visit http://gerrit.cloudera.org:8080/19507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212 Gerrit-Change-Number: 19507 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 24 Feb 2023 16:57:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 4: (2 comments) Thanks for adding docs for it! http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml@617 PS4, Line 617: for Impala Maybe we should mention that it is useful for the Coordinators and CatalogD as well. http://gerrit.cloudera.org:8080/#/c/19530/4/docs/topics/impala_iceberg.xml@623 PS4, Line 623: 6 nit: people might just blindly copying the values from here. So we probably want to provide a higher value as an example than one minute, e.g. one hour? -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 16:38:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 4: (10 comments) http://gerrit.cloudera.org:8080/#/c/19530/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19530/2//COMMIT_MSG@9 PS2, Line 9: IMPALA-11658 implements Iceberg manifest caching for Impala. This patch > Nit: "implements Iceberg manifest caching for Impala." Done http://gerrit.cloudera.org:8080/#/c/19530/2//COMMIT_MSG@10 PS2, Line 10: adds documentation for configuring the cache(s). > Nit: "adds documentation for configuring the cache(s). Done http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@614 PS2, Line 614: > Nit: contents? Done http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@617 PS2, Line 617: feature can be enabled for Impala by setting properties in Hadoop's core-site.xml > Nit: "as in the following" Done http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@631 PS2, Line 631: iceberg.io-impl: custom FileIO implementation to use in a > Do we recommend that this not be changed? Maybe we should say that? Done http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@632 PS2, Line 632: catalog. Must be set to enable manifest caching. Impala defaults to > Nit: "defaults to" Done http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@639 PS2, Line 639: > In the two cases above we started a new line after the title (in ), Removed newline after in the other two. http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@640 PS2, Line 640: iceberg.io.manifest.cache.max-total-bytes: maximum total > Why would I want to set this to anything other than a high value? Oh. I see Yes. Added "and lower than iceberg.io.manifest.cache.max-total-bytes.". http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@652 PS2, Line 652: of a manifest file to be considered for caching in bytes. Manifest files with > Maybe move this before iceberg.io.manifest.cache.expiration-interval-ms Yes, reordered the example and bullet points. http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@659 PS2, Line 659: > Shouldn't it be HadoopCatalogs and HiveCatalogs, i.e. plural? Done -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 16:22:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 3: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/703/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 16:25:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Hello Andrew Sherman, Daniel Becker, Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19530 to look at the new patch set (#4). Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg IMPALA-11658 implements Iceberg manifest caching for Impala. This patch adds documentation for configuring the cache(s). Testing: - Built docs locally Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe --- M docs/topics/impala_iceberg.xml 1 file changed, 60 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19530/4 -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 3: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/703/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 16:17:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Hello Andrew Sherman, Daniel Becker, Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19530 to look at the new patch set (#3). Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg IMPALA-11658 implement Iceberg manifest caching config for Impala. This patch add documentation for that settings. Testing: - Built docs locally Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe --- M docs/topics/impala_iceberg.xml 1 file changed, 60 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19530/3 -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19535 ) Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. Patch Set 1: (2 comments) Thanks Quanlong for fixing this. http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@75 PS1, Line 75: StringVal StringFunctions::Substring(FunctionContext* context, Shouldn't this also be IR_ALWAYS_INLINE? It calls StringFunctions::Substring() on L58. In mask-functions-ir.cc we do inline functions that don't directly call GetConstFnAttr() but call a function that calls it. Applies also to Left() and Right(). http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@850 PS1, Line 850: IntVal StringFunctions::Instr(FunctionContext* context, const StringVal& str, Shouldn't these Instr overloads, Locate and LocatePos also be IR_ALWAYS_INLINE? See comment on L75. -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 15:26:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19535 ) Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12436/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 14:27:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19535 ) Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/mask-functions-ir.cc File be/src/exprs/mask-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/mask-functions-ir.cc@778 PS1, Line 778: IR_ALWAYS_INLINE StringVal MaskFunctions::Mask(FunctionContext* ctx, const StringVal& val) { line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@256 PS1, Line 256: IR_ALWAYS_INLINE IntVal StringFunctions::Length(FunctionContext* context, const StringVal& str) { line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@289 PS1, Line 289: IR_ALWAYS_INLINE StringVal StringFunctions::Lower(FunctionContext* context, const StringVal& str) { line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@307 PS1, Line 307: IR_ALWAYS_INLINE StringVal StringFunctions::Upper(FunctionContext* context, const StringVal& str) { line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@329 PS1, Line 329: IR_ALWAYS_INLINE StringVal StringFunctions::InitCap(FunctionContext* context, const StringVal& str) { line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@621 PS1, Line 621: IR_ALWAYS_INLINE StringVal StringFunctions::Reverse(FunctionContext* context, const StringVal& str) { line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/19535/1/be/src/exprs/string-functions-ir.cc@790 PS1, Line 790: IR_ALWAYS_INLINE IntVal StringFunctions::Instr(FunctionContext* context, const StringVal& str, line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 14:08:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11943: Mark utf8 string functions with IR ALWAYS INLINE
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19535 Change subject: IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE .. IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE String functions that have both UTF-8 and the traditional ASCII behaviors have checks for the UTF8_MODE query option. The check is intended to be replaced with constants during codegen in LlvmCodeGen::InlineConstFnAttrs(). However, as mentioned in the method comment, InlineConstFnAttrs() only replaces call instructions inside the current function. To replace the call to FunctionContextImpl::GetConstFnAttr() inside the callee functions, we have to inline the callee functions (by annotating them with IR_ALWAYS_INLINE). This patch annotates UTF-8 related string functions with IR_ALWAYS_INLINE to make sure the checks on UTF8_MODE are all replaced in codegen. Perf tests: Ran PERF_STRING queries in targeted-perf (scale factor 100) on parquet format. Saw significant improvements: +-+--++-++ | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | +-+--++-++ | PERF_STRING-Q7 | parquet/none | 10.80 | 10.93 | -1.13% | | PERF_STRING-Q6 | parquet/none | 12.25 | 12.55 | -2.37% | | PERF_STRING-Q5 | parquet/none | 5.87 | 6.01| -2.33% | | PERF_STRING-Q3 | parquet/none | 5.02 | 5.26| -4.47% | | PERF_STRING-Q2 | parquet/none | 4.49 | 4.73| -4.88% | | PERF_STRING-Q4 | parquet/none | 4.99 | 5.28| I -5.48% | | PERF_STRING-Q1 | parquet/none | 4.02 | 4.29| I -6.34% | +-+--++-++ | PERF_STRING-Q13 | parquet/none | 9.90 | 11.48 | I -13.72% | | PERF_STRING-Q9 | parquet/none | 10.01 | 11.64 | I -13.97% | | PERF_STRING-Q11 | parquet/none | 9.83 | 11.56 | I -14.97% | | PERF_STRING-Q10 | parquet/none | 6.62 | 8.07| I -17.89% | | PERF_STRING-Q12 | parquet/none | 6.63 | 8.25| I -19.61% | | PERF_STRING-Q8 | parquet/none | 5.03 | 6.45| I -22.09% | +-+--++-++ Note that Q8-Q13 are new queries added by this patch to reveal the performance difference. Ran the same queries comparing the branch of reverting the UTF-8 changes (IMPALA-2019). Found no regressions anymore. Even see improvement in one query (Q13): +-+--++-++ | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | +-+--++-++ | PERF_STRING-Q11 | parquet/none | 9.85 | 9.56| +3.04% | | PERF_STRING-Q7 | parquet/none | 10.76 | 10.59 | +1.57% | | PERF_STRING-Q1 | parquet/none | 4.04 | 4.00| +0.83% | | PERF_STRING-Q4 | parquet/none | 5.02 | 5.00| +0.51% | | PERF_STRING-Q2 | parquet/none | 4.53 | 4.51| +0.32% | | PERF_STRING-Q5 | parquet/none | 5.81 | 5.81| +0.02% | | PERF_STRING-Q8 | parquet/none | 5.02 | 5.04| -0.31% | | PERF_STRING-Q12 | parquet/none | 6.65 | 6.68| -0.43% | | PERF_STRING-Q3 | parquet/none | 5.02 | 5.05| -0.53% | | PERF_STRING-Q6 | parquet/none | 12.23 | 12.33 | -0.84% | | PERF_STRING-Q10 | parquet/none | 6.68 | 6.74| -0.87% | | PERF_STRING-Q9 | parquet/none | 10.01 | 10.41 | -3.86% | | PERF_STRING-Q13 | parquet/none | 9.92 | 10.54 | I -5.85% | +-+--++-++ Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a --- M be/src/exprs/mask-functions-ir.cc M be/src/exprs/string-functions-ir.cc M testdata/workloads/targeted-perf/queries/string.test 3 files changed, 182 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/19535/1 -- To view, visit http://gerrit.cloudera.org:8080/19535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I19e8fba332ae329da8b1d37dba3bbc64f59e6f3a Gerrit-Change-Number: 19535 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19530 ) Change subject: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml File docs/topics/impala_iceberg.xml: http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@614 PS2, Line 614: content Nit: contents? http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@639 PS2, Line 639: In the two cases above we started a new line after the title (in ), from here on we don't. It would be better if it was consistent. http://gerrit.cloudera.org:8080/#/c/19530/2/docs/topics/impala_iceberg.xml@659 PS2, Line 659: HadoopCatalog and HiveCatalog Shouldn't it be HadoopCatalogs and HiveCatalogs, i.e. plural? -- To view, visit http://gerrit.cloudera.org:8080/19530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Gerrit-Change-Number: 19530 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Feb 2023 13:36:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@13 PS1, Line 13: Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 > Sorry, I can't. I tried to add the relevant tests, but couldn't find where One place where this could go is tests/query_test/test_observability.py. You can take 'test_merge_exchange_num_rows()' as an example of how to run a query and retrieve its runtime profile. I'd suggest to use another test query than the one in the Jira because structs containing collections are expected to become supported in the near future. We could for example try to access a non-existent table to provoke a planning failure. http://gerrit.cloudera.org:8080/#/c/19517/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/2//COMMIT_MSG@7 PS2, Line 7: profile if "... profile even if ..." would be clearer in my opinion. http://gerrit.cloudera.org:8080/#/c/19517/2//COMMIT_MSG@9 PS2, Line 9: should usually be "are normally included in the profile" would be better. http://gerrit.cloudera.org:8080/#/c/19517/2//COMMIT_MSG@10 PS2, Line 10: Query No need to capitalise "query"; see also the other "Query" on this line. http://gerrit.cloudera.org:8080/#/c/19517/2//COMMIT_MSG@11 PS2, Line 11: Upon failure, should add query options to the profile. It could be something like: "After this change, query options are also added to the profile upon planning failure." -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 24 Feb 2023 13:23:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 24 Feb 2023 13:09:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12435/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 11:15:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19534 Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg We push down predicates to Iceberg that uses them to filter out files when getting the results of planFiles(). Using the FileScanTask.residual() function we can find out if we have to use the predicates to further filter the rows of the given files or if Iceberg has already performed all the filtering. Basically if we only filter on partition columns then Iceberg can filter the files based on this and using these filter in Impala wouldn't filter any more rows from the output (taking into account that no partition evolution) was performed on the table. An additional benefit of not down no-op predicates is that we can potentially materialize less slots. For example: SELECT count(1) from iceberg_tbl where part_col = 10; In the above query Iceberg filters the files using the predicate on a partition column and then there won't be any need to materialize 'part_col' in Impala, nor to push down the 'part_col = 10' predicate. Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test 12 files changed, 290 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/1 -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12434/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 09:13:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12433/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 09:13:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12432/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 24 Feb 2023 09:05:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Abhishek Rawat has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. IMPALA-11858: Cap per backend memory estimate to its memory limit for admission admission controller was capping memory estimates for the backends to its physical memory. The memory estimates should instead be capped to the backend's memory limit for admission, which is computed during daemon initialization in ExecEnv::Init(). Also fixed the issue related to excessive logging in query profiles when using global admission controller. If the query was queued the remote admission controller client was logging 'Queued' status in profile every time it checked the query status and it hadn't changed. Testing: - Updated existing unit tests in admission-controller-test.cc - Also Running exhaustive tests Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/remote-admission-control-client.cc M be/src/scheduling/schedule-state.cc M be/src/scheduling/schedule-state.h 4 files changed, 37 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/19533/2 -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12431/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:53:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19533 ) Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/admission-controller-test.cc File be/src/scheduling/admission-controller-test.cc: http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/admission-controller-test.cc@912 PS1, Line 912: int64_t coordinator_admit_mem_limit = schedule_state->GetPerBackendMemLimitForAdmission(true); line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/admission-controller-test.cc@913 PS1, Line 913: int64_t executor_admit_mem_limit = schedule_state->GetPerBackendMemLimitForAdmission(false); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/schedule-state.cc File be/src/scheduling/schedule-state.cc: http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/schedule-state.cc@329 PS1, Line 329: per_backend_mem_to_admit = min(per_backend_mem_to_admit, GetPerBackendMemLimitForAdmission(false)); line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/19533/1/be/src/scheduling/schedule-state.cc@330 PS1, Line 330: coord_backend_mem_to_admit = min(coord_backend_mem_to_admit, GetPerBackendMemLimitForAdmission(true)); line too long (104 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 24 Feb 2023 08:50:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11858: Cap per backend memory estimate to its memory limit for admission
Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19533 Change subject: IMPALA-11858: Cap per backend memory estimate to its memory limit for admission .. IMPALA-11858: Cap per backend memory estimate to its memory limit for admission admission controller was capping memory estimates for the backends to its physical memory. The memory estimates should instead be capped to the backend's memory limit for admission, which is computed during daemon initialization in ExecEnv::Init(). Also fixed the issue related to excessive logging in query profiles when using global admission controller. If the query was queued the remote admission controller client was logging 'Queued' status in profile every time it checked the query status and it hadn't changed. Testing: - Updated existing unit tests in admission-controller-test.cc - Also Running exhaustive tests Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c --- M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/remote-admission-control-client.cc M be/src/scheduling/schedule-state.cc M be/src/scheduling/schedule-state.h 4 files changed, 32 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/19533/1 -- To view, visit http://gerrit.cloudera.org:8080/19533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3b1f6e530785ef832dbc831d7cc6793133f3335c Gerrit-Change-Number: 19533 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. Patch Set 2: (3 comments) Hi Daniel, Thank you for your reply and suggestions. http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@7 PS1, Line 7: i > Nit: unnecessary double space. Done http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@9 PS1, Line 9: Query options should usually be in the profile, > Could you clarify the commit message? Yeah, just like the jira title, query options should usually be in the profile, but when the Query fails during planning, Query options are missing. The RunFrontendPlanner method is used for planning. Failure here should add query options. e.g.: Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available. If the query fails after planning we should list the query options set by configuration and the planner. e.g.: Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available Query Options (set by configuration and planner): MT_DOP=0,TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available,MINMAX_FILTER_THRESHOLD=0.5,MINMAX_FILTERING_LEVEL=PAGE http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@13 PS1, Line 13: Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 > Can we add tests that verify that the query options are included in the pro Sorry, I can't. I tried to add the relevant tests, but couldn't find where to add them. Can you tell me? -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:44:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12430/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:41:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Baike Xia has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. IMPALA-11898: Add query options in the profile if the query failed in planning Query options should usually be in the profile, but when the Query fails during planning, Query options are missing. Upon failure, should add query options to the profile. Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 2 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/2 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12429/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:37:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19532 to look at the new patch set (#4). Change subject: IMPALA-11904: Data cache support dumping for reloading .. IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump function with setting 'data_cache_enable_dumping', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. After enabling the load function with setting 'data_cache_enable_loading', when the Impalad process starts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc 12 files changed, 653 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/4 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12428/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:27:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19532 to look at the new patch set (#3). Change subject: IMPALA-11904: Data cache support dumping for reloading .. IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump function with setting 'data_cache_enable_dumping', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. After enabling the load function with setting 'data_cache_enable_loading', when the Impalad process starts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc 12 files changed, 651 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/3 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19532 to look at the new patch set (#2). Change subject: IMPALA-11904: Data cache support dumping for reloading .. IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump function with setting 'data_cache_enable_dumping', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. After enabling the load function with setting 'data_cache_enable_loading', when the Impalad process starts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc 12 files changed, 651 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/2 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/19532/2/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/19532/2/be/src/runtime/io/data-cache.h@496 PS2, Line 496: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:19:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12427/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 24 Feb 2023 08:15:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
18770832...@163.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19532 Change subject: IMPALA-11904: Data cache support dumping for reloading .. IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump function with setting 'data_cache_enable_dumping', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. After enabling the load function with setting 'data_cache_enable_loading', when the Impalad process starts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc 12 files changed, 651 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/1 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <18770832...@163.com>
[Impala-ASF-CR] IMPALA-11904: Data cache support dumping for reloading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 ) Change subject: IMPALA-11904: Data cache support dumping for reloading .. Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.h@496 PS1, Line 496: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@204 PS1, Line 204: kudu::Env::Default()->NewRWFile(opts, path, _file->file_), line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@484 PS1, Line 484: string key; line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@487 PS1, Line 487: /// cache file. When we dump the cache metadata, the ‘file_’ is meaningless because line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@683 PS1, Line 683: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@1037 PS1, Line 1037: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/data-cache.cc@1284 PS1, Line 1284: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.h@417 PS1, Line 417: /// Try to dump the data of remote data cache to disk, so it could be loaded when line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/runtime/io/disk-io-mgr.cc@78 PS1, Line 78: DEFINE_bool(data_cache_enable_dumping, false, line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1086 PS1, Line 1086: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1105 PS1, Line 1105: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/19532/1/be/src/util/cache/lirs-cache.cc@1125 PS1, Line 1125: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:09:24 + Gerrit-HasComments: Yes