Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8928 )
Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command ...................................................................... Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup@1007 PS3, Line 1007: // row_format_val cannot be used as it conflicts with cache_op_val. > When I dump the grammar, the following two lines seem to be the problem: I think they conflict because cache_opt_val and row_format_val both accept empty. Take a look at the hack in L1015. Maybe we can address this issue by distinguishing between optional and non-optional cache val and row format productions. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java: http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72 PS4, Line 72: affectedPartitions = ((HdfsTable) tbl).getPartitions(); I think we should allow changing the table-level row format even if one of the partitions has an incompatible file format. My understanding is that partitions generally adopt the file format (and row format) specified in at the table level unless explicitly overridden. The table-level row format should only be relevant for those child partitions that have a compatible format. Allowing that alteration seems more flexible and consistent with allowing mixed-format partitions in the first place. For example, the user could get into the same state by altering the table-level row format first and then altering the partition formats. It's not clear why altering the partition formats first and then the table-level row format should not work. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@460 PS4, Line 460: reloadFileMetadata = alterTableSetRowFormat(tbl, I wonder why we need to reload the file metadata for this operation and SET FILEFORMAT. Maybe this flag is poorly named or it's use is ill-defined in HdfsTable.load(). Do you know why we need to set reloadFileMetadata to true sometimes? http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2266 PS4, Line 2266: * changing the row format the table metadata is marked as invalid and will be I don't think this part is correct. If we return true the file metadata is reloaded as part of this alteration operation, and not when the table is accessed next. The comment in alterTableSetFileFormat() is also wrong. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2267 PS4, Line 2267: * reloaded on the next access. The return value is true if the fileMetadata needs Why would we even need to reload the file metadata? http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2285 PS4, Line 2285: if (tbl instanceof HdfsTable) style nit: we always use { } for multi-line if http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2290 PS4, Line 2290: Preconditions.checkArgument(tbl instanceof HdfsTable); This can go at the top if this function. http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@471 PS4, Line 471: AnalyzesOk("alter table functional_seq.alltypes set row format delimited " + also tests for the PARTITION clause http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@473 PS4, Line 473: AnalysisError("alter table functional_kudu.alltypes set row format delimited " + test the PARTITION clause on an unpartitioned HDFS table http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@476 PS4, Line 476: AnalysisError("alter table functional_parquet.alltypes set row format delimited " + write these negative tests in a loop over a list of databases http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1338 PS4, Line 1338: # IMPALA-4323 - Test alter row format statement. Ensure alter statement updates metadata. We generally use this style: # IMPALA-4323: <text> http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1339 PS4, Line 1339: create table $DATABASE.delimited (c1 string, c2 string, c3 string) no need to specify $DATABASE because this session is already using $DATABASE http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1347 PS4, Line 1347: ---- QUERY Wow multiple QUERIES sections in a single test case work? I thought we needed to add "====" between test cases (QUERY and RESULTS sections). Are you sure these tests run as expected? Even if they do might be better to add the "====" for consistency. http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1348 PS4, Line 1348: alter table $DATABASE.delimited set row format delimited fields terminated by ' '; Add a comment what we expect the row format to be after the alteration. We expect the line terminator to still be '\002' right? http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1351 PS4, Line 1351: 'the','quick\002brown','fox\002jumped' Can we craft the test in a way that doesn't change how many fields are in a line? I kept wondering where the trailing tilde is, then I realized we have 4 fields per line with the new delimiter. Just seems harder to reason about than necessary. http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1375 PS4, Line 1375: alter table $DATABASE.delimited partition (c0=1) set row format delimited fields terminated by ' '; try altering the table-level row format and see how that affects the partitions and whether the behavior matches your expectations. -- To view, visit http://gerrit.cloudera.org:8080/8928 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154 Gerrit-Change-Number: 8928 Gerrit-PatchSet: 4 Gerrit-Owner: Adam Holley <[email protected]> Gerrit-Reviewer: Adam Holley <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Kim Jin Chul <[email protected]> Gerrit-Comment-Date: Wed, 10 Jan 2018 01:10:26 +0000 Gerrit-HasComments: Yes
