[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10813 ) Change subject: IMPALA-7163: Implement a state machine for the QueryState class .. Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc@a113 PS6, Line 113: May make sense to replace this by checking the state. Doing a racy read of the state should be fine. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@198 PS6, Line 198: struct FInstanceStatus { If you take the suggestion to only keep a single 'query_status_', there doesn't seem to be need for this struct. We can replace it with a single class member to record the fragment instance ID of the first failed fragment instance. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@212 PS6, Line 212: TUniqueId finst_id_; This is only meaningful when there is an error status, right ? Please document it. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@234 PS6, Line 234: FragmentInstanceState fragment instance http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@271 PS6, Line 271: INITIALIZED, Consider calling it "PREPARING" http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@274 PS6, Line 274: PREPARED, Consider calling it "EXECUTING" http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308 PS6, Line 308: /// The overall status of the Prepare phase for this query state. : FInstanceStatus prepare_status_; : : /// The overall status of the Exec phase (after Prepare()) for this query state. : FInstanceStatus exec_status_; : : /// Protects 'prepare_status_' and 'exec_status_'. : SpinLock status_lock_; It seems simpler to keep track of a single status for the first error hit instead of tracking the errors in different phases. The behavior in the existing code is that the first fragment instance which hits the error will report it to the coordinator directly. In theory, a fragment instance F1 can hit an error during exec after prepare phase is done while another fragment instance F2 can hit error during its prepare phase after F1 hits the error. The new patch seems to set query_status_ to the error of F2 while the coordinator will get the error of F1. So, this seems to cause divergence with the coordinator. So, in that sense, it's necessary to keep a single status only. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@320 PS6, Line 320: FInstanceStatus query_status_; Mind documenting the thread safety about this variable ? http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@238 PS6, Line 238: case BackendExecState::INITIALIZED: : case BackendExecState::PREPARED: : DCHECK(query_status_.ok()) << query_status_.ToString(); : if (!status.ok()) { : // Error while executing - go to ERROR state. : query_status_ = finst_status; : backend_exec_state_ = BackendExecState::ERROR; Can this be merged with TransitionNonTerminalState() above as: if (query_status_.IsCancelled()) { new_state = BackendExecState::CANCELLED; } else if (!query_status_.ok()) { new_state = BackendExecState::ERROR; } else { new_state = state == PREPARING ? EXECUTING : FINISHED; } http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@250 PS6, Line 250: case BackendExecState::FINISHED Under what circumstances would we call QueryState::UpdateBackendExecState() after entering a terminal state ? http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@457 PS6, Line 457: ErrorDuringPrepare( Won't it be a problem if we call ErrorDuringPrepare() here as it only bump the instances_prepared_barrier_ by one instead of the number of remaining fragment instances ? In that sense, won't caller of WaitForPrepare() wait forever ? Also, the thread creation failure path warrants a targeted test case. May be a debug action can be added in Thread::Create(). http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@460 PS6, Line 460: return; Seems wrong to return here. Don't we need to call ReportExecStatusAux() like we did in the past. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@463 PS6, Line 463: all_fis_status =
[Impala-ASF-CR] IMPALA-7234: Improve memory estimates produced by the Planner
Pooja Nilangekar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/11001 ) Change subject: IMPALA-7234: Improve memory estimates produced by the Planner .. IMPALA-7234: Improve memory estimates produced by the Planner Previously, the planner used the getMajorityFormat to estimate the memory requirements of its partitions. Additionally, before IMPALA-6625 was merged, the majority format for a multi-format table with no numerical majority was calculated using a HashMap, thus producing non deterministic results. This change ensures that the memory estimate is deterministic and always based on partition that has the maximum memory requirement. Testing: Ran all PlannerTests. Also, modified plans of scans with multiple partitions to ensure that the memory estimate produced corresponds to the partition with the maximum requirement. Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b --- M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test 8 files changed, 52 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/11001/3 -- To view, visit http://gerrit.cloudera.org:8080/11001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b Gerrit-Change-Number: 11001 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar
[Impala-ASF-CR] IMPALA-7209: Disallow self referencing in ALTER VIEW statements
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/10908 ) Change subject: IMPALA-7209: Disallow self referencing in ALTER VIEW statements .. Patch Set 7: (1 comment) I am sorry I had to rebase this change. I couldn't get impalad to start due to IMPALA-7316. http://gerrit.cloudera.org:8080/#/c/10908/6/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java: http://gerrit.cloudera.org:8080/#/c/10908/6/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java@69 PS6, Line 69: if (inlineViewRefs.contains(((InlineViewRef) tblRef).getView())) { : throw new AnalysisException( : String.format("Self-reference not allowed on view: %s", tblRef.toSql())); : } : : createColumnAndViewDefs(analyzer); : if (BackendConfig.INSTANCE.getComputeLineage() || RuntimeEnv.INSTANCE.isTestEnv()) { : computeLineageGraph(analyzer); : } : } : : @Override : public String toSql() { : StringBuilder sb = new StringBuilder(); : sb.append("ALTER VIEW "); : if (tableName_.getDb() != null) { : sb.append(tableName_.getDb() + "."); : } : sb.append(tableName_.getTbl()); : if (columnDefs_ != null) sb.append("(" + getColumnNames() + ")"); : sb.append(" AS " + viewDefStmt_.toSql()); : return sb.toString(); : } : } : : : : : > How about factoring this out into a method in QueryStmt and have helper met Done -- To view, visit http://gerrit.cloudera.org:8080/10908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17c231c9d74d9d411463a408b086eb874090b9b7 Gerrit-Change-Number: 10908 Gerrit-PatchSet: 7 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Sat, 21 Jul 2018 01:30:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7209: Disallow self referencing in ALTER VIEW statements
Pooja Nilangekar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/10908 ) Change subject: IMPALA-7209: Disallow self referencing in ALTER VIEW statements .. IMPALA-7209: Disallow self referencing in ALTER VIEW statements Previously, ALTER VIEW queries did not carry out reference checks in the analysis phase. This allowed the DDL operation to succeed but subsequent queries to the view threw StackOverflowError because the catalog was unable to resolve the reference. With this change, the AlterViewStmt checks for direct and in-direct self references before altering a view. Testing: Added tests to AnalyzeDDLTest to verify it. Change-Id: I17c231c9d74d9d411463a408b086eb874090b9b7 --- M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 5 files changed, 93 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/10908/7 -- To view, visit http://gerrit.cloudera.org:8080/10908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I17c231c9d74d9d411463a408b086eb874090b9b7 Gerrit-Change-Number: 10908 Gerrit-PatchSet: 7 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Pooja Nilangekar
[Impala-ASF-CR] PREVIEW: IMPALA-110: Support for multiple DISTINCT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 ) Change subject: PREVIEW: IMPALA-110: Support for multiple DISTINCT .. Patch Set 1: (7 comments) Did a quick initial pass over the backend part of this. http://gerrit.cloudera.org:8080/#/c/10771/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10771/1//COMMIT_MSG@60 PS1, Line 60: - Ran hdfs/core tests Can we add some spilling tests too? http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/aggregation-node.cc@118 PS1, Line 118: // Separate input batch into mini batches destined for the different aggs. I feel like there's a lot of unnecessary overhead in the pre-partitioning into mini-batches (the approach in general and maybe some of the specifics of how this code uses the RowBatch APIs). I'd be interested to see how much time is spent here for a high-NDV aggregation to see if it's worth optimising. We could avoid the construction of the intermediate batches if we pushed down logic into AddBatch so that it consumed rows from 'batch' up until it saw one that didn't match its index. http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator-ir.cc File be/src/exec/grouping-aggregator-ir.cc: http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator-ir.cc@171 PS1, Line 171: ++streaming_idx_; Let's hoist the store to 'streaming_idx_' out of the loop. http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator-ir.cc@190 PS1, Line 190: out_batch->ClearRow(row); This seems expensive to include in the inner loop. Can we avoid doing this entirely in the non-multiagg case? And do it efficiently with a memset() before calling this function in the multiagg case. http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator-ir.cc@191 PS1, Line 191: agg_idx_ Let's hoist this load out of the loop too. Can we codegen this out for the non-multi-agg case? http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator.h@296 PS1, Line 296: int32_t streaming_idx_; Comments? http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/10771/1/be/src/exec/grouping-aggregator.cc@285 PS1, Line 285: row_batch->ClearRow(row); This seems like unnecessary overhead. -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Jul 2018 01:13:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7234: Improve memory estimates produced by the Planner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11001 ) Change subject: IMPALA-7234: Improve memory estimates produced by the Planner .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/11001/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11001/1//COMMIT_MSG@12 PS1, Line 12: calcuated nit: typo http://gerrit.cloudera.org:8080/#/c/11001/1//COMMIT_MSG@14 PS1, Line 14: esimate nit: typo http://gerrit.cloudera.org:8080/#/c/11001/1//COMMIT_MSG@18 PS1, Line 18: esimate nit: typo http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@300 PS1, Line 300: public static Set getFileFormats( : Iterable partitions) { : Set fileFormats = Sets.newHashSet(); : for (FeFsPartition partition : partitions) { : fileFormats.add(partition.getFileFormat()); : } : return fileFormats; you can get rid of this since its used only in one place and move functionality to HdfsTable http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@97 PS1, Line 97: /** :* @return the set of file formats that the partitions in this table use. :* This API is only used by the TableSink to write out partitions. It :* should not be used for scanning. :*/ : public Set getFileFormats(); we can probably get rid of this since its used at only one place and move the functionality there. Interested in what others think about this. http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1369 PS1, Line 1369: Henece nit: typo http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@83 PS1, Line 83: : HdfsTable table = (HdfsTable) targetTable_; : // TODO: Estimate the memory requirements more accurately by partition type. : Set formats = table.getFileFormats(); you can pass the table directly and also simplify getPerPartitionMemReq() by using a contains(FORMAT) instead of a for-loop + switch-case. Also add corresponding comments. http://gerrit.cloudera.org:8080/#/c/11001/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@109 PS1, Line 109: * Returns the per-partition memory requirement for inserting into the given :* file format. update comment -- To view, visit http://gerrit.cloudera.org:8080/11001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b Gerrit-Change-Number: 11001 Gerrit-PatchSet: 1 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Sat, 21 Jul 2018 01:13:02 + Gerrit-HasComments: Yes
[native-toolchain-CR] Bump Kudu version to 5211897
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11003 ) Change subject: Bump Kudu version to 5211897 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5aa226d74150bc6623d685e3d995e1699db4f2e8 Gerrit-Change-Number: 11003 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Jul 2018 00:58:48 + Gerrit-HasComments: No
[Impala-ASF-CR] Test gerrit trigger
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/11009 ) Change subject: Test gerrit trigger .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/11009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c Gerrit-Change-Number: 11009 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] Test gerrit trigger
Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/11009 ) Change subject: Test gerrit trigger .. Test gerrit trigger Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c --- M be/src/common/global-flags.cc 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/11009/2 -- To view, visit http://gerrit.cloudera.org:8080/11009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c Gerrit-Change-Number: 11009 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] Test gerrit trigger
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11009 Change subject: Test gerrit trigger .. Test gerrit trigger Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c --- M be/src/common/global-flags.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/11009/1 -- To view, visit http://gerrit.cloudera.org:8080/11009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c Gerrit-Change-Number: 11009 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] Test gerrit trigger
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11008 Change subject: Test gerrit trigger .. Test gerrit trigger Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c --- M be/src/common/global-flags.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/11008/1 -- To view, visit http://gerrit.cloudera.org:8080/11008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If08d7ce10631670ed913b4ff5c48f9ff07c2709c Gerrit-Change-Number: 11008 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6299: Modify IRBuilder to use LLVM's CPU features
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/10979 ) Change subject: IMPALA-6299: Modify IRBuilder to use LLVM's CPU features .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen-test.cc@467 PS2, Line 467: data1_hash_fn, llvm::ArrayRef({llvm_data1, llvm_len1, seed})); > Thanks for explaining. Might be worth leaving a brief comment explaining th Done http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h@737 PS2, Line 737: > How about we move the method EnableCPUFeature out from this class and into Done. I have moved the helper function to the test class and added a comment stating that cpu_attrs_ should be modified only for tests. http://gerrit.cloudera.org:8080/#/c/10979/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/10979/3/be/src/codegen/llvm-codegen.cc@1519 PS3, Line 1519: > nit: not a big deal but just to be consistent with CpuInfo and the static f Done -- To view, visit http://gerrit.cloudera.org:8080/10979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifece8949c143146d2a1b38d72d21c2d733bed90f Gerrit-Change-Number: 10979 Gerrit-PatchSet: 5 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Jul 2018 00:33:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6299: Modify IRBuilder to use LLVM's CPU features
Pooja Nilangekar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/10979 ) Change subject: IMPALA-6299: Modify IRBuilder to use LLVM's CPU features .. IMPALA-6299: Modify IRBuilder to use LLVM's CPU features Previously, the IRBuilder of the LlvmCodeGen class used CpuInfo's list of enabled features to determine the validity of certain instructions. It did not consider the whitelist which passed while initilaizing the LlvmCodeGen class. Now, the IRBuilder inspects its own cpu attributes before emitting instruction. This change also adds functionality to modify the cpu attributes of the LlvmCodeGen class for testing. Testing: Verified that the current tests which use and modify CpuInfo produce expected results. Change-Id: Ifece8949c143146d2a1b38d72d21c2d733bed90f --- M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/filter-context.cc M be/src/util/cpu-info.cc 6 files changed, 90 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/10979/5 -- To view, visit http://gerrit.cloudera.org:8080/10979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifece8949c143146d2a1b38d72d21c2d733bed90f Gerrit-Change-Number: 10979 Gerrit-PatchSet: 5 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7173: [DOCS] Added check options in the load balancer examples
Alex Rodoni has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11006 Change subject: IMPALA-7173: [DOCS] Added check options in the load balancer examples .. IMPALA-7173: [DOCS] Added check options in the load balancer examples Change-Id: Ic5148aecf4605373ea6a28bf032bcfdab4f822fd --- M docs/topics/impala_proxy.xml 1 file changed, 4 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/11006/1 -- To view, visit http://gerrit.cloudera.org:8080/11006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic5148aecf4605373ea6a28bf032bcfdab4f822fd Gerrit-Change-Number: 11006 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni
[Impala-ASF-CR] IMPALA-3675: part 1: -Werror for ASAN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11002 ) Change subject: IMPALA-3675: part 1: -Werror for ASAN .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2846/ -- To view, visit http://gerrit.cloudera.org:8080/11002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec Gerrit-Change-Number: 11002 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 23:55:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7257. Support Kudu tables in LocalCatalog
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10912 ) Change subject: IMPALA-7257. Support Kudu tables in LocalCatalog .. Patch Set 3: (1 comment) LGTM. Will +2 after the small fixes are applied. http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java: http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java@140 PS3, Line 140: kuduTableName_ instead of name below > The line of code there is accessing KuduTableDescriptor::table_name() (set You are right. It's only used for logging in the backend. -- To view, visit http://gerrit.cloudera.org:8080/10912 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b6a317ee895e43e00ade953e814867b56b4e6dd Gerrit-Change-Number: 10912 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 22:55:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3675: part 1: -Werror for ASAN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11002 ) Change subject: IMPALA-3675: part 1: -Werror for ASAN .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/11002/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11002/4//COMMIT_MSG@9 PS4, Line 9: This fixes some clang warnings and enables -Werror going forward for > I don't see any diffs, other than the mismatched struct/class ones. I feel The clang-tidy precommit prevented most warnings from cropping up. The only fixes were from enabling the matched tags check and to suppress return-type-c-linkage that is also suppressed in clang-tidy. http://gerrit.cloudera.org:8080/#/c/11002/4/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/11002/4/be/CMakeLists.txt@105 PS4, Line 105: # Enabled for DEBUG and ASAN builds which are built pre-commit. > TSAN and UBSAN, too, or do you think that's overzealous? Will do that. I don't expect any issues but will rerun the build job to check: https://jenkins.impala.io/job/all-build-options-ub1604/2165 -- To view, visit http://gerrit.cloudera.org:8080/11002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec Gerrit-Change-Number: 11002 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 22:51:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3675: part 1: -Werror for ASAN
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/11002 ) Change subject: IMPALA-3675: part 1: -Werror for ASAN .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/11002/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11002/4//COMMIT_MSG@9 PS4, Line 9: This fixes some clang warnings and enables -Werror going forward for I don't see any diffs, other than the mismatched struct/class ones. I feel like I'm missing something. http://gerrit.cloudera.org:8080/#/c/11002/4/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/11002/4/be/CMakeLists.txt@105 PS4, Line 105: # Enabled for DEBUG and ASAN builds which are built pre-commit. TSAN and UBSAN, too, or do you think that's overzealous? -- To view, visit http://gerrit.cloudera.org:8080/11002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec Gerrit-Change-Number: 11002 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Comment-Date: Fri, 20 Jul 2018 22:44:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7257. Support Kudu tables in LocalCatalog
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10912 ) Change subject: IMPALA-7257. Support Kudu tables in LocalCatalog .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java: http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java@140 PS3, Line 140: kuduTableName_ instead of name below > It's eventually used here: https://github.com/apache/impala/blob/master/be/ The line of code there is accessing KuduTableDescriptor::table_name() (set down on line 149 of this file below). The table name passed in TTableDescriptor should be the _impala_ table name, though, which may not match the underlying kudu table name. (usually the underlying kudu table is "impala::dbname.tablename"). I don't know who accesses the TTableDescriptor.tablename -- To view, visit http://gerrit.cloudera.org:8080/10912 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b6a317ee895e43e00ade953e814867b56b4e6dd Gerrit-Change-Number: 10912 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 22:27:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5826: [DOCS] Documented the IDLE SESSION TIMEOUT query option
Alex Rodoni has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11004 Change subject: IMPALA-5826: [DOCS] Documented the IDLE_SESSION_TIMEOUT query option .. IMPALA-5826: [DOCS] Documented the IDLE_SESSION_TIMEOUT query option Change-Id: I37182a3c5cf19fdcbb5f247ed71d43f963143510 --- M docs/impala.ditamap A docs/topics/impala_idle_session_timeout.xml M docs/topics/impala_timeouts.xml 3 files changed, 106 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/11004/1 -- To view, visit http://gerrit.cloudera.org:8080/11004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I37182a3c5cf19fdcbb5f247ed71d43f963143510 Gerrit-Change-Number: 11004 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni
[Impala-ASF-CR] IMPALA-7257. Support Kudu tables in LocalCatalog
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10912 ) Change subject: IMPALA-7257. Support Kudu tables in LocalCatalog .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@258 PS3, Line 258: public static List loadPartitionByParams( > I think eventually they could become default methods in the FeKuduTable int Another option is to put them in a class FeKuduTable.Common. It would be closer to the Java 8 code structure. http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java: http://gerrit.cloudera.org:8080/#/c/10912/3/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java@140 PS3, Line 140: kuduTableName_ instead of name below > I think that was a bug in the old implementation. The name that shows up in It's eventually used here: https://github.com/apache/impala/blob/master/be/src/exec/kudu-scan-node-base.cc#L103. Why shouldn't it be the table name in Kudu? -- To view, visit http://gerrit.cloudera.org:8080/10912 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b6a317ee895e43e00ade953e814867b56b4e6dd Gerrit-Change-Number: 10912 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 21:26:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7256: Aggregator mem usage isn't reflected in summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10989 ) Change subject: IMPALA-7256: Aggregator mem usage isn't reflected in summary .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba6ef207bed47810fc742aec3481db5f313cf97f Gerrit-Change-Number: 10989 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 20:59:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7256: Aggregator mem usage isn't reflected in summary
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10989 ) Change subject: IMPALA-7256: Aggregator mem usage isn't reflected in summary .. IMPALA-7256: Aggregator mem usage isn't reflected in summary This patch fixes a bug where memory used by an Aggregator wasn't being reflected in the exec summary for the corresponding aggregation node by ensuring that the Aggregator's MemTracker is a child of the node's MemTracker. Testing: - Manually ran a query and checked that all memory used by the Aggregator is accounted for in the exec summary. Change-Id: Iba6ef207bed47810fc742aec3481db5f313cf97f Reviewed-on: http://gerrit.cloudera.org:8080/10989 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/aggregator.cc M be/src/exec/aggregator.h M be/src/exec/exec-node.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/reservation-manager.cc M be/src/runtime/reservation-manager.h 7 files changed, 23 insertions(+), 10 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/10989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iba6ef207bed47810fc742aec3481db5f313cf97f Gerrit-Change-Number: 10989 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3675: part 1: -Werror for ASAN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11002 ) Change subject: IMPALA-3675: part 1: -Werror for ASAN .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2846/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec Gerrit-Change-Number: 11002 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 20 Jul 2018 20:56:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3675: part 1: -Werror for ASAN
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11002 to look at the new patch set (#3). Change subject: IMPALA-3675: part 1: -Werror for ASAN .. IMPALA-3675: part 1: -Werror for ASAN This fixes some clang warnings and enables -Werror going forward for ASAN. It adds -Wno-return-type-c-linkage to suppress a warning that otherwise would fail the build. It also enables the mismatched tags check and fixes the various instances of it (this warning is irritating for YouCompleteMe users). Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec --- M .clang-tidy M be/CMakeLists.txt M be/src/exec/hdfs-table-writer.h M be/src/exprs/expr.h M be/src/exprs/scalar-expr.h M be/src/runtime/krpc-data-stream-recvr.h M be/src/runtime/mem-tracker.h 7 files changed, 8 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/11002/3 -- To view, visit http://gerrit.cloudera.org:8080/11002 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If83ca41cde49fd6cfde479e45f9cfdd9e3a45bec Gerrit-Change-Number: 11002 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: Yep, that makes sense -- lazy initializing the StructType field in Table looks like an easy win here. You're right that changing the treatment of IncompleteTable would be a much deeper change. Worth adding to the commit message, btw, that the "7.5% of catalog server heap" is in the case where the catalog is mostly made up of IncompleteTables. Or quantify the number of bytes per IncompleteTable that are saved. That way we can pretty easily estimate the absolute heap size savings on an arbitrary catalog just by knowing how many tables they have. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 20:36:58 + Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 5211897
Thomas Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11003 Change subject: Bump Kudu version to 5211897 .. Bump Kudu version to 5211897 Testing: - Ran a full toolchain build. Change-Id: I5aa226d74150bc6623d685e3d995e1699db4f2e8 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/03/11003/1 -- To view, visit http://gerrit.cloudera.org:8080/11003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5aa226d74150bc6623d685e3d995e1699db4f2e8 Gerrit-Change-Number: 11003 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Misha Dmitriev has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: Thank you for your comments/clarifications, guys. I see what you mean now. I've checked the heap dump more carefully, and now I realize that all the StructType instances (all of which have both of their data fields, fieldMap_ and fields_, empty) come from IncompleteTable's. Here is the relevant excerpt from the jxray heap report: 182,587K (1.2%): org.apache.impala.catalog.StructType: 7790403 / 100% objects ↖org.apache.impala.catalog.ArrayType.itemType_ ↖org.apache.impala.catalog.IncompleteTable.type_ ↖{java.util.concurrent.ConcurrentHashMap}.values ↖org.apache.impala.catalog.CatalogObjectCache.metadataCache_ ↖org.apache.impala.catalog.Db.tableCache_ ↖{java.util.concurrent.ConcurrentHashMap}.values ↖java.util.concurrent.atomic.AtomicReference.value ↖org.apache.impala.catalog.CatalogServiceCatalog.dbCache_ ↖org.apache.impala.catalog.BuiltinsDb.parentCatalog_ ↖Java Static org.apache.impala.catalog.Catalog.builtinsDb_ So, StructType objects themselves take 1.2% of the heap, plus their empty fieldMap_ maps take 2.5% and empty fields_ lists take 1.2%. And looks like if we simply initialize these StructType objects lazily (more precisely, the 'ArrayType type_' field in Table), we will save all the above memory, i.e. considerably more than the current patch saves. Furthermore, the lazy changes in StructType itself can be reverted, which would keep the code more readable. Makes sense? I think this is the best I can do in Impala code with my current very limited understanding of it. More advanced changes, like making IncompleteTable not inherit from Table, would likely require a much deeper knowledge of how this code works. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 19:50:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6299: Modify IRBuilder to use LLVM's CPU features
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10979 ) Change subject: IMPALA-6299: Modify IRBuilder to use LLVM's CPU features .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h@737 PS2, Line 737: enabled_cpu_attrs_ > Thanks for the explanation. I agree that we should be defensive. Maybe all How about we move the method EnableCPUFeature out from this class and into the test class only and maintain this duplication of attributes there. and also comment on cpu_attrs_ that it should not be modified except in InitializeLlvm() or for testing purposes. http://gerrit.cloudera.org:8080/#/c/10979/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/10979/3/be/src/codegen/llvm-codegen.cc@1519 PS3, Line 1519: long nit: not a big deal but just to be consistent with CpuInfo and the static flags, use int64_t -- To view, visit http://gerrit.cloudera.org:8080/10979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifece8949c143146d2a1b38d72d21c2d733bed90f Gerrit-Change-Number: 10979 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 19:10:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: Right, it seems like this is just a case of wastage by IncompleteTable objects, which contain the entirety of the 'Table' superclass but in fact use none of it. We could probably get away with a much cheaper implementation of IncompleteTable that has only the "name" field and nothing else. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 19:01:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Hello Bharath Vissapragada, Balazs Jeszenszky, Todd Lipcon, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10587 to look at the new patch set (#4). Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. IMPALA-6994: Avoid reloading a table's HMS data for file-only operations The problem is that while inserting a new row to an existing partition of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary call to Hive MetaStore to do a getTable() and list the partitions of the table. In such a case where a row is being inserted to an existing partition or to an unpartitioned table only the file metadata for HDFS tables needs to be updated. This change avoids calling the Hive Meta Store in updatePartitionsFromHms() by providing a hint from the caller, to skip loading the partitions. Testing: Ran core test without failure. Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 2 files changed, 126 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/4 -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7234: Improve memory estimates produced by the Planner
Pooja Nilangekar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11001 Change subject: IMPALA-7234: Improve memory estimates produced by the Planner .. IMPALA-7234: Improve memory estimates produced by the Planner Previously, the planner used the getMajorityFormat to estimate the memory requirements of its partitions. Additionally, before IMPALA-6625 was merged, the majority format for a multi-format table with no numerical majority was calcuated using a HashMap, thus producing non deterministic results. This change ensures that the memory esimate is deterministic and always based on partition that has the maximum memory requirement. Testing: Ran all PlannerTests. Also, modified plans of scans with multiple partitions to ensure that the memory esimate produced corresponds to the partition with the maximum requirement. Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b --- M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test 8 files changed, 51 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/11001/1 -- To view, visit http://gerrit.cloudera.org:8080/11001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b Gerrit-Change-Number: 11001 Gerrit-PatchSet: 1 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251 PS3, Line 1251: RELOAD_PARTITION_METADATA if not set in flags results in not getting the :* partition details from HiveMetaStore in updatePartitionsFromHms(). > nit: this double negative is a bit confusing. Perhaps it's clearer to rephr Changed the comment, thanks for rewording it. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254 PS3, Line 1254: or if there are no partitions in the table > I'm not sure I follow this part. How does the caller know if there are no p That's right the unpartitioned case doesn't have to consider this, I have removed the sentence. -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:57:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: My question (2) may have been unclear. I'm referring to StructType's HashMap that you're changed to be lazy initialized. In particular, I'm noting that its empty only when the no-arg constructor is called or when the input fields array is empty. Todd's question is basically the same (in StructType): why are so many empty (e.g., no field) structs created? We can look at call-sites.. perhaps we should change how/where these structs are used. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:56:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5542: Impala cannot scan Parquet decimal stored as int64 t/int32 t
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11000 ) Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc@1536 PS1, Line 1536: DCHECK_EQ(sizeof(Decimal4Value::StorageType), slot_desc->type().GetByteSize()); i dont remember where the file level column metadata validation occurs in code, but can you add this check there in case the metadata is corrupted and/or does not match the hms column metadata http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@62 PS1, Line 62: static int ByteSize(const InternalType& v) { return sizeof(InternalType); } > I'm still confused about what this function does. Can we rename to somethin This methods represents the encoded byte size if the internalType is written to parquet. I guess the confusion lies where in there is special casing for when this is used for encoding/decoding of fixed/variable len types. >From what i understand this used to get the encoded size of an internalType >only for fixed len types, except for String type. --- returning -1 makes no sense and it only does that for cases which are impossible, hence the DCHECK(false) associated with them. The part in HdfsParquetTableWriter is correct since a plain_encoded_value_size_ < 0 means that its a variable len encoded type and since impala supports only string type to be written as a variable len encoded type, it calls on ParquetPlainEncoder::ByteSize to get it encode byte size which. http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@182 PS1, Line 182: /// Decodes t from 'buffer', reading up to the byte before 'buffer_end'. 'buffer' : /// need not be aligned. If PARQUET_TYPE is FIXED_LEN_BYTE_ARRAY then 'fixed_len_size' : /// is the size of the object. Otherwise, it is unused. : /// Returns the number of bytes read or -1 if the value was not decoded successfully. I think it will be super helpful to list which types are decoded by this template method and which ones have specialization. What do you all think? http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@208 PS1, Line 208: // Only used for testing to test decimals stored as INT32. : // Otherwise Impala stores decimals as FIXED_LEN_BYTE_ARRAY. a bit misleading, this implies that impala only supports this type for testing (that too not sure if for reading or writing). and the second line implies this method is only used for writing. http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc@195 PS1, Line 195: if (slot_desc->type().type == TYPE_DECIMAL) { here it is! you should add checks here to make sure int32 and int64 encoded decimal types have the right precision. -- To view, visit http://gerrit.cloudera.org:8080/11000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 Gerrit-Change-Number: 11000 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 18:49:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Misha Dmitriev has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: Thank you for the quick response, Vuk. Answering your questions: 1. The respective IMPALA-7219 contains more details. In the case that I looked at, there was exactly the same number (a little less than 8 million) of both kinds of HashMaps. I.e. ~8M coming from IncompleteTable.colsByName_ and ~8M coming from StructType.fieldMap_. Each empty HashMap takes 48 bytes, so collectively they wasted ~700MB of memory. 2. Let me clarify. There is no such thing as an object (collection) "with no fields". When you call 'new HashMap()', you always create a HashMap object, which internally has fields like 'int size, threshold' etc. The good news is that the most important field, 'HashMap$Entry table[]' is null until the first element is added to this table. But just the above data fields, plus the 12-byte internal header that each object in the JVM heap has, result in the fact that an empty, completely unpopulated HashMap, occupies 48 bytes. This is what I want to get rid of. If all this still seems a little vague, I suggest you take a look at the slides here: https://www.slideshare.net/MikhailMishaDmitriev/java-memory-analysis-problems-and-solutions?trk=v-feed In particular, slide 10 shows the internals and lifecycle of a HashMap. So your suggestion in (2) may or may not make sense depending what exactly you had in mind, and needs some clarification. Regarding tests: yes, I ran it through the Impala Jenkins build, and all tests passed: https://jenkins.impala.io/job/pre-review-test/186/ -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:41:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10995 ) Change subject: IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions .. Patch Set 3: Code-Review+1 (1 comment) Carry Adam's +1 http://gerrit.cloudera.org:8080/#/c/10995/2/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/10995/2/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@889 PS2, Line 889: > nit: add empty line back Done -- To view, visit http://gerrit.cloudera.org:8080/10995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia164c55fd9459cf5f11eb72561e9cd4ffe1d5367 Gerrit-Change-Number: 10995 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:28:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions
Fredy Wijaya has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/10995 ) Change subject: IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions .. IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions In the prior code, the authorization checker for the masked privilege requests skips the check for system database access. As a result, certain commands, such as SHOW CREATE VIEW that references built-in database requires permission to access to the built-in database where accessing built-in database should always be allowed. The patch fixes it by using the authorizePrivilegeRequest() method that does a check on the system database similar to how other authorization checks are performed. Testing: - Added new authorization test - Ran all FE tests Change-Id: Ia164c55fd9459cf5f11eb72561e9cd4ffe1d5367 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java 3 files changed, 44 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/10995/3 -- To view, visit http://gerrit.cloudera.org:8080/10995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia164c55fd9459cf5f11eb72561e9cd4ffe1d5367 Gerrit-Change-Number: 10995 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6299: Modify IRBuilder to use LLVM's CPU features
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10979 ) Change subject: IMPALA-6299: Modify IRBuilder to use LLVM's CPU features .. Patch Set 2: (3 comments) No worries about the rebase, it happens sometimes. http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen-test.cc@467 PS2, Line 467: CpuInfo::EnableFeature(CpuInfo::SSE4_2, false); > Yes, because `expected_hash` is computed using the `HashUtil::Hash` functio Thanks for explaining. Might be worth leaving a brief comment explaining this. http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/codegen/llvm-codegen.h@737 PS2, Line 737: current_cpu_attrs_ Thanks for the explanation. I agree that we should be defensive. Maybe all this needs is a slight clarifying comment about when the contents of the maps may change? > The other option I can think of is trying to figure out from the context if > this is an impalad instance or a test call. We have TestInfo::is_test() - you could add a DCHECK. http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/util/bit-util-test.cc File be/src/util/bit-util-test.cc: http://gerrit.cloudera.org:8080/#/c/10979/2/be/src/util/bit-util-test.cc@325 PS2, Line 325: const int64_t CPU_INFO_FLAG = CpuInfo::SSSE3; > I had to change it because I moved the declaration of CpuInfo::SSSE3 to cpu Gotcha, didn't think of that -- To view, visit http://gerrit.cloudera.org:8080/10979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifece8949c143146d2a1b38d72d21c2d733bed90f Gerrit-Change-Number: 10979 Gerrit-PatchSet: 2 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 18:28:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: (5 comments) See suggestion about changing IncompleteTable to not inherit from Table -- maybe now with the 'FeTable' interface we could avoid that? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java File fe/src/main/java/org/apache/impala/catalog/StructType.java: http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@37 PS2, Line 37: public class StructType extends Type { agreed with Vuk -- not clear to me why we have so many empty structs. I thought StructType is only used for describing the schema of a table, and a table with no fields seems uncommon. Do you have handy the "paths to GC root" for these empty fields? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@39 PS2, Line 39: // Made non-final so that it can be initialized lazily to save memory when this map is not used. nit: this comment makes sense when looking at this as a patch, but when someone sees the code later, the "Made non-final" thing isn't super clear. Perhaps better to just describe the current treatment of the member, like: "Map of field name to field. Null is used to indicate empty to save memory." http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@84 PS2, Line 84: public StructField getField(String fieldName) { I wonder if we could do even better here and only lazily construct a field map at places where method is called. it seems this is the only usage site of the field map, and best I can tell, this method is only called during query analysis, and never by the catalogd itself. Given that, it doesn't really make sense to keep around these members, when it would be pretty trivial to reconstruct the map as necessary during path resolution. Then we could fully remove all these maps (even the non-empty ones). Would that save significant memory? Another thought here: it seems like the most common case where we have an empty struct woudl be a reference from IncompleteTable. I'm guessing you were looking at a catalogd dump from a cluster with a lot of tables? Maybe we could change IncompleteTable to not inherit from 'Table' so that it can avoid all of these members entirely? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java@320 PS2, Line 320: { : return; : } > nit: single line same as below http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java@447 PS2, Line 447: public Column getColumn(String name) { I think it would actually be an error to call getColumn before the colsByName_ map was initted. Such a call would indicate that columns haven't been loaded. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:26:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7296: bytes limit for row batch queue
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10977 ) Change subject: IMPALA-7296: bytes limit for row batch queue .. Patch Set 5: Mainly to avoid potentially making things worse for narrow rows. I'm open to changing it but this was the easiest way I could convince myself that it wouldn't make anything worse. There are some weird cases like count(*) where we may end up with row batches that have very little memory - just the tuple_ptrs_ which aren't counted in the byte size. I did some rough calculations and in those cases, even if we include the tuple_ptrs_ size, we could potentially end up with 1000s of batches in the queue, which would mean more untracked memory. An alternative is to add some constant number of bytes per row batch for overhead. -- To view, visit http://gerrit.cloudera.org:8080/10977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681 Gerrit-Change-Number: 10977 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 18:22:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions
Adam Holley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10995 ) Change subject: IMPALA-7325: Incorrect SHOW CREATE VIEW with built-in functions .. Patch Set 2: Code-Review+1 (1 comment) Good after nit. http://gerrit.cloudera.org:8080/#/c/10995/2/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/10995/2/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@889 PS2, Line 889: @Test nit: add empty line back -- To view, visit http://gerrit.cloudera.org:8080/10995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia164c55fd9459cf5f11eb72561e9cd4ffe1d5367 Gerrit-Change-Number: 10995 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:22:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6923: Update scripts in benchmark folder to store workload and few minor updates
Nithya Janarthanan has posted comments on this change. ( http://gerrit.cloudera.org:8080/10100 ) Change subject: IMPALA-6923: Update scripts in benchmark folder to store workload and few minor updates .. Patch Set 12: (5 comments) http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py File tests/benchmark/perf_result_datastore.py: http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py@23 PS12, Line 23: from subprocess import * > "*" is an antipattern for imports. It is better to be explicit about import Done Removed this import since it is not used in the file http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py@63 PS12, Line 63: def insert_workload_summary(self, run_info_id): : self._insert_workload_summary(run_info_id) > This doesn't seem to do anything useful, so just rename _insert_workload_su Done. Good catch http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py@85 PS12, Line 85: dont_save_profiles): > Negatives are weird. It would be better to call this save_profiles and flip Done http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py@88 PS12, Line 88: temp_profile_folder = os.path.join(home_dir,"workspace","impala-workload-runner","profiles") > This is badly formated. There should be spaces between parameters. impala-f Done http://gerrit.cloudera.org:8080/#/c/10100/12/tests/benchmark/perf_result_datastore.py@155 PS12, Line 155: self._cache: Dictionary of with run_info/user as key/value > This isn't quite right. self._cache appears to be a hash in which a tuple o Done -- To view, visit http://gerrit.cloudera.org:8080/10100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ica7c00ad59963d466bae9e607a4692af0138962c Gerrit-Change-Number: 10100 Gerrit-PatchSet: 12 Gerrit-Owner: Nithya Janarthanan Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Jinchul Kim Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Nithya Janarthanan Gerrit-Comment-Date: Fri, 20 Jul 2018 18:11:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7329: Blacklist CDH Maven snapshots repository
Fredy Wijaya has removed Jim Apple from this change. ( http://gerrit.cloudera.org:8080/10999 ) Change subject: IMPALA-7329: Blacklist CDH Maven snapshots repository .. Removed reviewer Jim Apple. -- To view, visit http://gerrit.cloudera.org:8080/10999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Id945bc2769f92f3df3bb4f617b00db77a6502ff3 Gerrit-Change-Number: 10999 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 ) Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. Patch Set 2: (2 comments) thx for the changes! questions: (1) do you know which one of these contributes to more empty collections (structs or tables) for the case(s) you looked at? (2) an empty collection for a struct comes up when it has no fields (otherwise there will be no savings). have you seen examples of call-sites where these data structures are created with no fields? perhaps some of those are worth following up (separately). http://gerrit.cloudera.org:8080/#/c/10982/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10982/2//COMMIT_MSG@14 PS2, Line 14: pls add any tests that you ran. for this change, passing existing tests should be fine. http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java@320 PS2, Line 320: { : return; : } nit: single line -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 17:55:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7256: Aggregator mem usage isn't reflected in summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10989 ) Change subject: IMPALA-7256: Aggregator mem usage isn't reflected in summary .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2844/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/10989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba6ef207bed47810fc742aec3481db5f313cf97f Gerrit-Change-Number: 10989 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 17:45:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7256: Aggregator mem usage isn't reflected in summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10989 ) Change subject: IMPALA-7256: Aggregator mem usage isn't reflected in summary .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba6ef207bed47810fc742aec3481db5f313cf97f Gerrit-Change-Number: 10989 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 17:45:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists
Misha Dmitriev has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10982 Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists .. IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists This change switches initialization of HashMaps in IncompleteTable.colsByName_ and StructType.fieldMap_ from eager to lazy. In this way, we avoid wasting memory when HashMaps stay empty (even an empty HashMap uses at least 48 bytes in the heap). This optimization becomes really relevant when a catalog server loads a very large number (millions) of tables. Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 --- M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Table.java 2 files changed, 35 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/10982/2 -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7209: Disallow self referencing in ALTER VIEW statements
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10908 ) Change subject: IMPALA-7209: Disallow self referencing in ALTER VIEW statements .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/10908/4/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java: http://gerrit.cloudera.org:8080/#/c/10908/4/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java@75 PS4, Line 75: InlineViewRef fromViewRef = (InlineViewRef) fromTblRef; > That was my initial approach. However, I faced the issue that while getting right, we ideally need to implement something like collectInlineViewRefs() in all the subclasses but that is probably not worth the effort since it is not used anywhere else. http://gerrit.cloudera.org:8080/#/c/10908/6/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java: http://gerrit.cloudera.org:8080/#/c/10908/6/fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java@69 PS6, Line 69: while (!subQueries.isEmpty()) { : QueryStmt stmt = subQueries.pop(); : if (stmt instanceof SelectStmt) { : List fromTblRefs = ((SelectStmt) stmt).getTableRefs(); : for (TableRef fromTblRef : fromTblRefs) { : if (fromTblRef instanceof InlineViewRef) { : InlineViewRef fromViewRef = (InlineViewRef) fromTblRef; : if (fromViewRef.getView() == ((InlineViewRef) tblRef).getView()) { : throw new AnalysisException(String.format( : "Self-reference not allowed on view: %s", tblRef.toSql())); : } else { : subQueries.push(fromViewRef.getViewStmt()); : } : } : } : QueryStmt whereStmt = ((SelectStmt) stmt).getWhereQueryStmt(); : if (whereStmt != null) { : subQueries.add(whereStmt); : } : } else if (stmt instanceof UnionStmt) { : List unionOperands = ((UnionStmt) stmt).getOperands(); : for (UnionOperand operand : unionOperands) { : subQueries.push(operand.getQueryStmt()); : } : } else { : throw new AnalysisException("Subqueries not supported for " + : stmt.getClass().getSimpleName() + " statements"); : } : How about factoring this out into a method in QueryStmt and have helper methods in SelectStmt and UnionStmt to collect each of their own view refs? That way you could do something like if (viewDefStmt_.collectInlineViewRefs().contains(tblRef.getView())... -- To view, visit http://gerrit.cloudera.org:8080/10908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17c231c9d74d9d411463a408b086eb874090b9b7 Gerrit-Change-Number: 10908 Gerrit-PatchSet: 6 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Fri, 20 Jul 2018 17:28:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7296: bytes limit for row batch queue
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10977 ) Change subject: IMPALA-7296: bytes limit for row batch queue .. Patch Set 5: Hi Tim, will it be simpler in general to just use max_bytes instead of both (max_batches, max_bytes) to control the queue capacity ? Can you please elaborate on the reasoning for not doing so ? -- To view, visit http://gerrit.cloudera.org:8080/10977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681 Gerrit-Change-Number: 10977 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 17:27:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Patch Set 3: (3 comments) Just left some notes on the HdfsTable side. I dont know the CatalogOpExecutor code well so probably better for someone else to take a look at that part. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251 PS3, Line 1251: RELOAD_PARTITION_METADATA if not set in flags results in not getting the :* partition details from HiveMetaStore in updatePartitionsFromHms(). nit: this double negative is a bit confusing. Perhaps it's clearer to rephrase like: If RELOAD_PARTITION_METADATA is set in flags, the partition details will be reloaded from the HMS. In some cases, this may not be necessary, and when unset, an expensive call to the HMS can be avoided. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254 PS3, Line 1254: or if there are no partitions in the table I'm not sure I follow this part. How does the caller know if there are no partitions in the table without fetching the list of partitions? In the case of an unpartitioned table (no partition keys) the code already takes care of this internally without the caller having to pass it, right? http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1418 PS3, Line 1418: Updates the partitions of an HdfsTable so that they are in sync with the Hive :* Metastore. this doesn't seem to be an accurate description of the function, since the function also reloads file metadata, right? Maybe the function should just be called updatePartitionMetadata? Then it's clearer that the flags specify whether we want to update the file metadata, the list of partitions, or both, right? -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 16:57:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7329: Blacklist CDH Maven snapshots repository
Fredy Wijaya has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/10999 ) Change subject: IMPALA-7329: Blacklist CDH Maven snapshots repository .. IMPALA-7329: Blacklist CDH Maven snapshots repository The Impala development boostrapping depends on CDH Maven snapshots which transitively pull dependencies from other repositories which can cause the build to be non-reproducible, e.g. IMPALA-7316. This patch makes the build to be reproducible by blacklisting cdh.snapshots.repo so that Maven does not accidentally downloads the latest CDH snapshots when running a build, which can cause incompatibility issues. Testing: - Ran core tests with CDH_BUILD_NUMBER=422770 and did not hit the issue described in IMPALA-7316 Change-Id: Id945bc2769f92f3df3bb4f617b00db77a6502ff3 --- M impala-parent/pom.xml 1 file changed, 11 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/10999/3 -- To view, visit http://gerrit.cloudera.org:8080/10999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id945bc2769f92f3df3bb4f617b00db77a6502ff3 Gerrit-Change-Number: 10999 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5542: Impala cannot scan Parquet decimal stored as int64 t/int32 t
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11000 ) Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t .. Patch Set 1: (2 comments) My main question arose out of me trying to understand the implications of the changes to ByteSize() - I feel like there's something weird there. http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@62 PS1, Line 62: static int ByteSize(const InternalType& v) { return sizeof(InternalType); } I'm still confused about what this function does. Can we rename to something like "InternalByteSize()". What does it returning -1 mean also? It's weird that it also sometimes seems to be used to return the encoded byte size - that seems wrong. E.g. I'm looking at this logic in HdfsParquetTableWriter and wondering if it's wrong: *bytes_needed = plain_encoded_value_size_ < 0 ? ParquetPlainEncoder::ByteSize(*v) : plain_encoded_value_size_; http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py@293 PS1, Line 293: def _create_table_from_file(self, table_name, unique_database): It would be nice to use this for other tests (no requirement, just an observation :)). -- To view, visit http://gerrit.cloudera.org:8080/11000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 Gerrit-Change-Number: 11000 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Jul 2018 16:07:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Le Minh Nghia has posted comments on this change. ( http://gerrit.cloudera.org:8080/10992 ) Change subject: IMPALA-6490: Reconnect shell when remote restarts .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/10992/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10992/3//COMMIT_MSG@9 PS3, Line 9: impad > typo: impalad Done http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_client.py@233 PS3, Line 233: If not, an exception : will be catched > I think the internal workings of the function shouldn't be described in the Done http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_shell.py@614 PS3, Line 614: promt > typo: prompt Why do I have so many typos? :(. Done http://gerrit.cloudera.org:8080/#/c/10992/3/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/10992/3/tests/custom_cluster/test_shell_interactive_reconnect.py@85 PS3, Line 85: self._start_impala_cluster(["--kill"]) : self._start_impala_cluster([]) > Is this really needed? Before each test a clean new impala cluster is start Done -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 3 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Jul 2018 14:10:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Hello Zoltan Borok-Nagy, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10992 to look at the new patch set (#5). Change subject: IMPALA-6490: Reconnect shell when remote restarts .. IMPALA-6490: Reconnect shell when remote restarts If the remote impalad died while the shell waited for a command to complete, the shell disconnected. Previously after restarting the remote impalad, we need to run "connect;" to reconnect, now the shell will automatically reconnect. Testing: Added test_auto_connect_after_impalad_died in test_shell_interactive_reconnect.py Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 --- M shell/impala_client.py M shell/impala_shell.py M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_shell_interactive_reconnect.py 4 files changed, 52 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10992/5 -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 5 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10992 ) Change subject: IMPALA-6490: Reconnect shell when remote restarts .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/10992/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10992/3//COMMIT_MSG@9 PS3, Line 9: impad typo: impalad http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_client.py@233 PS3, Line 233: If not, an exception : will be caught I think the internal workings of the function shouldn't be described in the doc comment. The user of this method only needs to know what will be the result on what condition. http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/10992/3/shell/impala_shell.py@614 PS3, Line 614: promt typo: prompt http://gerrit.cloudera.org:8080/#/c/10992/3/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/10992/3/tests/custom_cluster/test_shell_interactive_reconnect.py@85 PS3, Line 85: self._start_impala_cluster(["--kill"]) : self._start_impala_cluster([]) Is this really needed? Before each test a clean new impala cluster is started anyway. Also, if you just want to restart the cluster, you don't need to kill it. You can just call _start_impala_cluster([]) -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 4 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Jul 2018 13:55:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Hello Zoltan Borok-Nagy, Attila Jeges, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10992 to look at the new patch set (#4). Change subject: IMPALA-6490: Reconnect shell when remote restarts .. IMPALA-6490: Reconnect shell when remote restarts If the remote impad died while the shell waited for a command to complete, the shell disconnected. Previously after restarting the remote impalad, we need to run "connect;" to reconnect, now the shell will automatically reconnect. Testing: Added test_auto_connect_after_impalad_died in test_shell_interactive_reconnect.py Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 --- M shell/impala_client.py M shell/impala_shell.py M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_shell_interactive_reconnect.py 4 files changed, 54 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10992/4 -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 4 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Le Minh Nghia has posted comments on this change. ( http://gerrit.cloudera.org:8080/10992 ) Change subject: IMPALA-6490: Reconnect shell when remote restarts .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/10992/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10992/2//COMMIT_MSG@12 PS2, Line 12: now > now Done http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py@33 PS2, Line 33: from thrift.Thrift import TApplicationException, TException : > nit: One line Done http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py@234 PS2, Line 234: ll be catched and return False.""" : if self.connect > Comment is still incorrect. Hopefully done. http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py@566 PS2, Line 566: nnected(): > test_connected() could be renamed to is_connected() or something similar to Done http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py@566 PS2, Line 566: if not self.imp_client.is_connected(): : print_to_stderr > Introducing 'connected' is unnecessary. Mathod should be called in the if s Done http://gerrit.cloudera.org:8080/#/c/10992/2/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/10992/2/tests/custom_cluster/test_shell_interactive_reconnect.py@84 PS2, Line 84: # reconnect. > Would it be possible to use ImpalaShell() object here instead of pexpect? ( I added comments which explain my decision http://gerrit.cloudera.org:8080/#/c/10992/2/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/10992/2/tests/shell/test_shell_interactive.py@43 PS2, Line 43: SHELL_HISTORY_FILE = os.path.expanduser("~/.impalahistory") > 'START_CLUSTER' is not used anywhere, you can remove this line. Done -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 3 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Jul 2018 12:34:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5542: Impala cannot scan Parquet decimal stored as int64 t/int32 t
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11000 Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t .. IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t The Decimal type in Parquet is a logical type. That means the Parquet file stores some physical/primitive type that is annotated by the DECIMAL tag to make it behave like decimals. The allowed physical types for decimals are INT32, INT64, FIXED, and BINARY. Before this commit Impala could only read decimals stored as FIXED or BINARY. Spark decided to write decimals as INT32 or INT64 when their precision allows it: (1 <= precision <= 9) ==> INT32 (10 <= precision <= 18) ==> INT64 I updated our column readers to accept INT32 and INT64 as valid physical types for decimals. Testing: * extended parquet-plain-test.cc * added Parquet files generated by Spark 2.3.1 and updated test_scanners.py Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-plain-test.cc M testdata/data/README A testdata/data/decimal_stored_as_int32.parquet A testdata/data/decimal_stored_as_int64.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 9 files changed, 82 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/11000/1 -- To view, visit http://gerrit.cloudera.org:8080/11000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 Gerrit-Change-Number: 11000 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6490: Reconnect shell when remote restarts
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/10992 ) Change subject: IMPALA-6490: Reconnect shell when remote restarts .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/10992/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10992/2//COMMIT_MSG@12 PS2, Line 12: noew now http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py@33 PS2, Line 33: from thrift.Thrift import TApplicationException : from thrift.Thrift import TException nit: One line http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_client.py@234 PS2, Line 234: "Checks to see if the current Impala connection is still alive. If not, an exception : will be raised. Comment is still incorrect. http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py@566 PS2, Line 566: test_connection test_connected() could be renamed to is_connected() or something similar to make clear that we expect a True/False return value. http://gerrit.cloudera.org:8080/#/c/10992/2/shell/impala_shell.py@566 PS2, Line 566: connected = self.imp_client.test_connection() : if not connected: Introducing 'connected' is unnecessary. Mathod should be called in the if statement. http://gerrit.cloudera.org:8080/#/c/10992/2/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/10992/2/tests/custom_cluster/test_shell_interactive_reconnect.py@84 PS2, Line 84: proc = pexpect.spawn(SHELL_CMD) Would it be possible to use ImpalaShell() object here instead of pexpect? (just like in 'test_auto_reconnect' above). 'ImpalaShell' class also has a 'prompt' member that we could use for this test (instead of pexpect.expect()). http://gerrit.cloudera.org:8080/#/c/10992/2/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/10992/2/tests/shell/test_shell_interactive.py@43 PS2, Line 43: START_CLUSTER = "%s/bin/start-impala-cluster.py" % os.environ['IMPALA_HOME'] 'START_CLUSTER' is not used anywhere, you can remove this line. -- To view, visit http://gerrit.cloudera.org:8080/10992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13365a9696886f01294e98054cf4e7cd66ab712 Gerrit-Change-Number: 10992 Gerrit-PatchSet: 2 Gerrit-Owner: Le Minh Nghia Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Le Minh Nghia Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Jul 2018 10:35:11 + Gerrit-HasComments: Yes