[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 9 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 22 Feb 2018 06:55:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Reviewed-on: http://gerrit.cloudera.org:8080/8958 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesS
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1980/ -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 9 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 22 Feb 2018 01:43:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 9: Code-Review+2 GVO failed because of a single test after the rebase. It was a matter of reordering analysis checks to produce the right expected error message. Trivial fix. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 9 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 22 Feb 2018 01:40:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#9). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1975/ -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 8 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 22 Feb 2018 01:23:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#7). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1975/ -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 8 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 20:17:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 8: Code-Review+2 Rebase and minor formatting fix. Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 8 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 20:17:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#8). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 6: Going to rebase. Wish me luck. Will let you know if interesting conflicts show up. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 18:59:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 6: Code-Review+2 (1 comment) Nice! http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154 PS5, Line 154: Throws if the database :* does not exists. Returns null if the table does not exist. > Added TODO. I know. The getTable() and all its variants are kind of depressing to look at. Thanks for adding the todo. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 18:31:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#6). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@519 PS5, Line 519: analysisResult_.isDescribeTableStmt() || : analysisResult_.isResetMetadataStmt() || : analysisResult_.isUseStmt() || : analysisResult_.isShowTablesStmt() || : analysisResult_.isAlterTableStmt()) > nit: maybe create a function like you did for isHierarchicalAuthStmt(). Done http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@104 PS5, Line 104: reset() > Haven't seen where this is called yet, but I am wondering if we also need t Resetting the timeline does not make sense because it may already contain events. To avoid confusion, I removed reset() and added comments and Preconditions checks to indicate loadTables() should only be called once per statement. http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154 PS5, Line 154: Throws if the database :* does not exists. Returns null if the table does not exist. > Isn't this kind of weird behavior? throw in one case and return null in the Added TODO. Note that the previous getTable() had a bug because if you passed !throwIfError then an NPE could be hit. I think this new version is at least better than before, albeit still weird. http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java@956 PS5, Line 956: , > nit: remove Done http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java@962 PS5, Line 962: , > nit: remove Done -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 06:39:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 5: Code-Review+1 Thanks for addressing my comments! -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 21 Feb 2018 00:24:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 5: (5 comments) Minor comments left http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@519 PS5, Line 519: analysisResult_.isDescribeTableStmt() || : analysisResult_.isResetMetadataStmt() || : analysisResult_.isUseStmt() || : analysisResult_.isShowTablesStmt() || : analysisResult_.isAlterTableStmt()) nit: maybe create a function like you did for isHierarchicalAuthStmt(). http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@104 PS5, Line 104: reset() Haven't seen where this is called yet, but I am wondering if we also need to clear the timeline as well. http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154 PS5, Line 154: Throws if the database :* does not exists. Returns null if the table does not exist. Isn't this kind of weird behavior? throw in one case and return null in the other? Is it easy to simplify it? If not, mind adding a TODO to clean this a bit in the future. http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java@956 PS5, Line 956: , nit: remove http://gerrit.cloudera.org:8080/#/c/8958/5/fe/src/main/java/org/apache/impala/service/Frontend.java@962 PS5, Line 962: , nit: remove -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 20 Feb 2018 21:22:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 5: Phil, any more comments? -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 20 Feb 2018 19:32:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 5: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 16 Feb 2018 18:00:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 4: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 4 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 16 Feb 2018 18:00:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#5). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@302 PS4, Line 302: Map of all query-relevant tables that is populated before analysis in : // Frontend#requestTableLoadAndWait(). : // An entry in the map is guaranteed to point to a loaded table. This could mean : // the table was loaded successfully or a load was attempted but failed. : // The absence of an entry indicates that table was not in the Catalog at the time : // t > nit: is most of the detail for this comment better stated in the StmtTableC Condensed comment. http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@43 PS4, Line 43: v > nit: sp Done http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@46 PS4, Line 46: Frontend.class > use this class. Oops. Done. http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@62 PS4, Line 62: numCatalogUpdates_ > unclear from the name what numCatalogUpdates_ tracks. how about: numLoadsRe In my mind these are two completely different things, so I'd prefer not to use the same "load" terminology for them. I modified the var names to distinguish between sent/received and added comments. http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@115 PS4, Line 115:* Collects and loads all tables and views required to analyze the given statement > nit: period Done -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 4 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 16 Feb 2018 18:00:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 4: Code-Review+1 (5 comments) thanks for the refactor and db load handling. mostly small comments from my end. http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@302 PS4, Line 302: Map of all query-relevant tables that is populated before analysis in : // Frontend#requestTableLoadAndWait(). : // An entry in the map is guaranteed to point to a loaded table. This could mean : // the table was loaded successfully or a load was attempted but failed. : // The absence of an entry indicates that table was not in the Catalog at the time : // t nit: is most of the detail for this comment better stated in the StmtTableCache class? http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@43 PS4, Line 43: v nit: sp http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@46 PS4, Line 46: Frontend.class use this class. http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@62 PS4, Line 62: numCatalogUpdates_ unclear from the name what numCatalogUpdates_ tracks. how about: numLoadsReceived_ (and perhaps numLoadsRequested_)? http://gerrit.cloudera.org:8080/#/c/8958/4/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@115 PS4, Line 115:* Collects and loads all tables and views required to analyze the given statement nit: period -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 4 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 16 Feb 2018 17:29:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Philip Zeyliger, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#4). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 3: (22 comments) I did not rebase for now to make reviewing easier. Still need to rebase and run the full test suite. Should be ok to review the big changes. http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG@62 PS3, Line 62: Testing: : - A core/hdfs run succeeded : - No new tests were added because the existing functional tests : provide good coverage of various metadata loading scenarios. : - The issue reported in IMPALA-5152 is basically impossible now. : Adding FE unit tests for that bug specifically would require : ugly changes to the new code to enable such testing. > Your view of these tests is obviously much deeper than mine. We expect that * Reworked the loading code into a new class, StmtMetadataLoader * Added FE unit test for the StmtMetadataLoader * The tests cover basic and interesting cases, but are not exhaustive (I don't think that's required or useful) http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@460 PS3, Line 460: // Process statements for which column-level privilege requests may be registered : // except for DESCRIBE TABLE, REFRESH/INVALIDATE, USE or SHOW TABLES statem > stale comment? Done http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@462 PS3, Line 462: isResult_.isQueryStmt() || analysisResult_.isInsertStmt() || : analysisResult_.isUpdateStmt() || analysisResult_.isDeleteStmt() || : analysi > worth grouping into a hasColumnPrivilege method? Done http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@508 PS3, Line 508: sult_.isDescribeTableStmt() || : analysisResult_.isResetMetadataStmt() || : analysisResult_.isUseStmt() || : analysisResult_.isShowTablesStmt() || : analysisResult_ > what's special about these? how would one know that altertable belongs here Added comment. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS3, Line 309: // the map was populated. > is that TODO about other catalog objects still relevant? I think so, but I don't think a TODO here in the code will make us more likely to address it. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/LimitElement.java File fe/src/main/java/org/apache/impala/analysis/LimitElement.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/LimitElement.java@124 PS3, Line 124: || expr.contains(Subquery.cl > comment is stale, pls update. Done http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java@284 PS3, Line 284: public static List getCandidateTables(List path, String sessionDb) { > This seems easy to unit test. Would it make sense to do so? There are ~800 lines of test code dedicated to path analysis in AnalyzeStmts.java. Those tests exercise this function through the metadata loading code path and through the analysis code path. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@62 PS3, Line 62: tableName_.toPath() > add a precondition in the constructor to check that this is not null. Done http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@71 PS3, Line 71: tableName_.toPath() > check not-null in constructor. Done http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/S
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@460 PS3, Line 460: // Process statements for which column-level privilege requests may be registered : // except for DESCRIBE TABLE, REFRESH/INVALIDATE, USE or SHOW TABLES statem stale comment? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@462 PS3, Line 462: isResult_.isQueryStmt() || analysisResult_.isInsertStmt() || : analysisResult_.isUpdateStmt() || analysisResult_.isDeleteStmt() || : analysi worth grouping into a hasColumnPrivilege method? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@508 PS3, Line 508: sult_.isDescribeTableStmt() || : analysisResult_.isResetMetadataStmt() || : analysisResult_.isUseStmt() || : analysisResult_.isShowTablesStmt() || : analysisResult_ what's special about these? how would one know that altertable belongs here? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS3, Line 309: // the map was populated. is that TODO about other catalog objects still relevant? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/LimitElement.java File fe/src/main/java/org/apache/impala/analysis/LimitElement.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/LimitElement.java@124 PS3, Line 124: || expr.contains(Subquery.cl comment is stale, pls update. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@62 PS3, Line 62: tableName_.toPath() add a precondition in the constructor to check that this is not null. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@71 PS3, Line 71: tableName_.toPath() check not-null in constructor. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@961 PS3, Line 961: DatabaseNotFoundException fwict, this is the exception that we were previously using to give back an error message about a non-existent db whereas now we state that the table is missing. would it be reasonable to record this in the table cache (in a separate map)? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@969 PS3, Line 969: Tbls.put(tblName, t so we can have views that are marked as loaded, but their base tables may not be loaded? not sure what we do for renaming a base table of a view. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java@104 PS1, Line 104: Table does > Fair point. We are somewhat inconsistent about this today. Here's some back My guess is that the case you refer to is the common case. If the error is more specific, I think that's useful (e.g., db name typo). I saw in the frontend code where we get this info so if its not too cumbersome to carry around, would be useful. I'll note that the new error is technically correct so don't I feel too strongly about modifying the proposed behavior. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet:
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 3: (13 comments) I was curious how this works, so took a look through. It certainly makes a ton of sense to me to explicitly load the metadata as one phase of the query. http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8958/3//COMMIT_MSG@62 PS3, Line 62: Testing: : - A core/hdfs run succeeded : - No new tests were added because the existing functional tests : provide good coverage of various metadata loading scenarios. : - The issue reported in IMPALA-5152 is basically impossible now. : Adding FE unit tests for that bug specifically would require : ugly changes to the new code to enable such testing. Your view of these tests is obviously much deeper than mine. We expect that for most queries, the number of round trips (on a clean "cache") is exactly one, but for views involving views, it's one plus the number of views (or fewer). Does it make sense to explicitly count round trips (perhaps expose it as a metric of "number of rounds of prioritized loading" in the profile) and concoct one of these deliberately? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/Path.java@284 PS3, Line 284: public static List getCandidateTables(List path, String sessionDb) { This seems easy to unit test. Would it make sense to do so? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@61 PS3, Line 61:* Subclasses should override this method as necessary. Is it possible that someone will forget to do so in the future in an unpleasant way? (You could make this abstract and force implementors to provide the empty function definition explicitly.) http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@231 PS3, Line 231: stmt.collectTableRefs(tblRefs, true); I don't know what the usual style here is, but you might want to consider: "true /* only from clause */" or even creating an enum or creating a method called collectTableRefsFromClauseOnly(). Basically, boolean arguments are really hard to remember. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@964 PS1, Line 964: if (tbl == null) continue; Perhaps %d requested and %d loaded tables? I think the fraction (e.g., "10/8") would be meaningless without the source. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@827 PS3, Line 827: public static final class StmtTableCache { javadoc? Is this actually a cache? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@866 PS3, Line 866: final long debugLoggingFrequency = 10; nit: rename to include units? (e.g., debugLoggingFrequencyMillis). In a previous life, I've made these difficult-but-configurable via Java system properties: Integer.getInteger("org.apache.impala.service.Frontend.DEBUG_LOGGING_FREQUENCY", 5000) Up to you. http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@894 PS3, Line 894:// Loop until all the missing tables are loaded in the Impalad's catalog cache. : // In every iteration of this loop we wait for one catalog update to arrive. : while (!missingTbls.isEmpty()) { Is there a limit to the number of iterations we're willing to do? Similarly, can we check if the query has been cancelled? I worry that a bug here will cause an infinite loop, whereas in practice, if catalogUpdateCount > 10, something is wrong? http://gerrit.cloudera.org:8080/#/c/8958/3/fe/src/main/java/org/apache/impala/service/Frontend.java@906 PS3, Line 906: LOG.warn("Catalog restart detected while waiting for tabl
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#3). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2385 PS2, Line 2385: given name from object for the 'loadedTables' map :* in the global analysis state > Weird sentence. I think you need to remove "object for". Done http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@63 PS2, Line 63: public void collectTableRefs(List tblRefs) { : } > nit: single line Done http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@877 PS2, Line 877: is > remove Done http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@898 PS2, Line 898: currCatalog > Is is possible that in the case of a catalog restart, some of the tables th That's not a crazy thought at all. I considered this scenario and believe it will work for the following reasons: * A Table added to the loadedTbls map reflects the table loaded at some point in time. The requestTableLoadAndWait() process is strictly additive, i.e., new loaded tables may be added, but an existing loaded table is never removed, even if its current state in the impalad catalog is "unloaded". * Tables on the impalad side are not modified in place. This means that a Table in the loadedTbls will always remain in the loaded state. * Query compilation only used Tables from the StmtTableCache, never from the impalad's catalog cache. Modified function comment to explain. http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@936 PS2, Line 936: loadedTables > loadedTbls Done http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@987 PS2, Line 987: Sets.newHashSet(tableNames); > Why not just "return tableNames"? Good point. Done. http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1118 PS2, Line 1118: Planner > nit: Eventually we may want to rename to something like "Query Compilation" I agree. Changed to Query Compliation. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 12 Feb 2018 19:08:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2385 PS2, Line 2385: given name from object for the 'loadedTables' map :* in the global analysis state Weird sentence. I think you need to remove "object for". http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@63 PS2, Line 63: public void collectTableRefs(List tblRefs) { : } nit: single line http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@877 PS2, Line 877: is remove http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@898 PS2, Line 898: currCatalog Is is possible that in the case of a catalog restart, some of the tables that were considered loaded now cease to be? Maybe a crazy thought, but I am wondering how we can ensure that whatever is in loadedTbls is actually loaded in the catalog that the stmt will use. I am not convinced this is even a problem but just wanted to point this out. http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@936 PS2, Line 936: loadedTables loadedTbls http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@987 PS2, Line 987: Sets.newHashSet(tableNames); Why not just "return tableNames"? http://gerrit.cloudera.org:8080/#/c/8958/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1118 PS2, Line 1118: Planner nit: Eventually we may want to rename to something like "Query Compilation" -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 07 Feb 2018 22:26:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Hello Bharath Vissapragada, Dimitris Tsirogiannis, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8958 to look at the new patch set (#2). Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 1: (39 comments) http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@73 PS1, Line 73: if (!fromClauseOnly) tblRefs.add(new TableRef(tableName_.toPath(), null)); > Maybe replace the if check with a Preconditions.checkState(!fromClauseOnly) Good point. This Preconditions check would be sensible in many places, so I opted to clean up the collectTabeRefs() interface instead because fromClauseonly=true is only needed for QueryStmts. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@66 PS1, Line 66: // Set in analyzeAndAuthorize(). : private ImpaladCatalog catalog_; > If I understand it correctly, that shouldn't be needed, given that analysis Need this for AnalysisContext#checkSystemDbAccess which is annoying but also difficult to clean up (I tried). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367 PS1, Line 367: > IMO, this method should be documented, given this is the starting point of Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@518 PS1, Line 518: analysisResult_.isAlterTableStmt() || : analysisResult_.isAlterViewStmt()); > Curious, where did this come from? We can add column priv requests for something like this: use functional; alter table allcomplextypes.int_array_col set fileformat sequencefile; The table path resolves to a column and we register an auth request. Actually, I think any alter table could hit this, so generalized this check to any AlterTableStmt. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS1, Line 309: private final Map loadedTables; > nit: My feeling is that we can call this referencedTables_ (and backup with Renamed to stmtTableCache. I renamed this because "referencedTables" could be misleading in the sense that this map could contain tables that are needed for resolving a path, but are not actually the target of that path. So in some sense not all tables are really "referenced" by the statement. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2361 PS1, Line 2361: or the db > I don't think the existence of db is checked in this function anymore. Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372 PS1, Line 2372: if (table.isLoaded() && table instanceof IncompleteTable) { > Preconditions.checkState(table.isLoaded())? Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2383 PS1, Line 2383: wit > nit: with Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java@82 PS1, Line 82: tableName_ > other statements assert that tableName is not null on construction. this st Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@85 PS1, Line 85: tableName_ > can tableName_ be null? Added Preonditions check in c'tor. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java@68 PS1, Line 68: public > @Override Does not override since it's not a StatementBase. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 1: (19 comments) First round of comments. Overall approach looks reasonable to me. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@73 PS1, Line 73: if (!fromClauseOnly) tblRefs.add(new TableRef(tableName_.toPath(), null)); Maybe replace the if check with a Preconditions.checkState(!fromClauseOnly); since it doesn't make sense to call this function with true in this context. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@66 PS1, Line 66: // Set in analyzeAndAuthorize(). : private ImpaladCatalog catalog_; If I understand it correctly, that shouldn't be needed, given that analysis can proceed with the LoadedTables state. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@518 PS1, Line 518: analysisResult_.isAlterTableStmt() || : analysisResult_.isAlterViewStmt()); Curious, where did this come from? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2361 PS1, Line 2361: or the db I don't think the existence of db is checked in this function anymore. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java@68 PS1, Line 68: public @Override http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@64 PS1, Line 64: public abstract void collectTableRefs(List tblRefs, boolean fromClauseOnly); If most of the statement classes that extend StatementBase don't return anything you can just provide a default implementation here and avoid adding a function in all these cases. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@823 PS1, Line 823:* ignored and not returned or added to 'loadedTables'. Comment where defaultDb is used. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@825 PS1, Line 825: private Set getMissingTables nit: can you put the following two functions below requestTblLoadAndWait so that the code reads top-down? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@866 PS1, Line 866: org.apache.impala.analysis.Path.getCandidateTables nit: you can use import static if you want to avoid the classpath inside the function. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@872 PS1, Line 872: LoadedTables The name sounds a bit generic. Maybe LoadedTablesCache or something to indicate the temporal nature of it (i.e. it is specific to a query not a global thing). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@873 PS1, Line 873: public final ImpaladCatalog catalog; I see that this is used for initializing the Analyzer. But given that the analysis proceeds based on the loaded tables (my understanding from the commit msg), does the Analyzer still need it? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@885 PS1, Line 885: If non-NULL You mean the timeline, no? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@889 PS1, Line 889: defaultDb nit: I would call it sessionDb http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@903 PS1, Line 903: public public? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@916 PS1, Line 916:
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2383 PS1, Line 2383: wit nit: with http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java@82 PS1, Line 82: tableName_ other statements assert that tableName is not null on construction. this statement doesn't, pls add this assertion or check the condition here. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@85 PS1, Line 85: tableName_ can tableName_ be null? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@108 PS1, Line 108: sq: nit: other loops in this file add a space after the iter variable. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java@275 PS1, Line 275: list of table names pls state any restrictions on the path-- I found this unclear to read. for example, path: <"tableName"> should return <(defaultDb, tableName)> path: <"dbName", "tableName"> will return <(defaultDb, dbName), (dbName, tableName)> ? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@105 PS1, Line 105: } why aren't baseTblResultExprs_ included? the comment there states that these will be resolved to "base tbl refs". I assume the child class does this? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1033 PS1, Line 1033: } I was expecting this class to look at baseTblResultExprs_ http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java File fe/src/main/java/org/apache/impala/service/MetadataOp.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java@281 PS1, Line 281: ; nit: extra ";" http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java@104 PS1, Line 104: Table does I think the previous error was more informative (and it implies that the the table does not exist). -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 09 Jan 2018 21:25:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 1: (11 comments) Patch looks good to me overall, have a few suggestions. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367 PS1, Line 367: IMO, this method should be documented, given this is the starting point of analysis. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS1, Line 309: private final Map loadedTables; nit: My feeling is that we can call this referencedTables_ (and backup with a comment saying they are loaded). Reason being, we are separating loading and analysis part, so anything that reached here means it is already loaded. Please ignore if you disagree, I'm not too strong about this either. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372 PS1, Line 2372: if (table.isLoaded() && table instanceof IncompleteTable) { Preconditions.checkState(table.isLoaded())? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835 PS1, Line 835: // Database does not exist. Maybe catch DatabaseNotFoundEx? CatalogEx seems generic (I checked Catalog#getTable() seems to be throwing DBNotFoundEx after you removed ImpaladCatalog#getTable()). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@838 PS1, Line 838: if (!tbl.isLoaded()) { Should we handle tbl.isLoaded() && tbl instanceof IncompleteTable here and fail fast instead? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@859 PS1, Line 859: private Set collectTableCandidates(StatementBase stmt, String defaultDb) { I feel like this method belongs to the AnalysisCtx() and not frontend? Then we could just refactor requestTableLoadAndWait() to requestTableLoadAndWait(List) http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@910 PS1, Line 910: if (missingTbls.isEmpty()) return new LoadedTables(catalog, loadedTbls); Should we update something in the timeline to reflect that all the metadata is locally available? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@931 PS1, Line 931: currCatalog != catalog Not fully familiar with != implementation, so I could be wrong on this one. Shouldn't we do something like (currCatalog.serviceId != catalog.serviceId) ? Else, any change to the catalog, like a new updates etc, can alter the hashCode() resulting in an incorrect equals() ? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@942 PS1, Line 942: LOG.warn(String.format("Waiting for table metadata. " + Should we dump current missingTables() too? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@951 PS1, Line 951: getMissingTables(catalog, missingTbls, defaultDb, loadedTbls); Make getMissingTables() always use getCatalog() instead of passing it around? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@958 PS1, Line 958: issueLoadRequest Not sure I follow the need for this. L960 makes sure missingTbls is up to date and we always need to load the remaining ones right? I don't see a case when issueLoadRequest is false and missingTbls is not empty. In other words, every time we pass the loop condition, we need to load something. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 09 Jan 2018 01:21:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8958 Change subject: IMPALA-5152: Introduce metadata loading phase .. IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/AuthorizationStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SetStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowCreateFunctio