Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup@1007
PS3, Line 1007: // row_format_val cannot be used as it conflicts with 
cache_op_val.
> When I dump the grammar, the following two lines seem to be the problem:
I think they conflict because cache_opt_val and row_format_val both accept 
empty.

Take a look at the hack in L1015. Maybe we can address this issue by 
distinguishing between optional and non-optional cache val and row format 
productions.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72
PS4, Line 72:       affectedPartitions = ((HdfsTable) tbl).getPartitions();
I think we should allow changing the table-level row format even if one of the 
partitions has an incompatible file format.

My understanding is that partitions generally adopt the file format (and row 
format) specified in at the table level unless explicitly overridden. The 
table-level row format should only be relevant for those child partitions that 
have a compatible format.

Allowing that alteration seems more flexible and consistent with allowing 
mixed-format partitions in the first place.

For example, the user could get into the same state by altering the table-level 
row format first and then altering the partition formats. It's not clear why 
altering the partition formats first and then the table-level row format should 
not work.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@460
PS4, Line 460:           reloadFileMetadata = alterTableSetRowFormat(tbl,
I wonder why we need to reload the file metadata for this operation and SET 
FILEFORMAT. Maybe this flag is poorly named or it's use is ill-defined in 
HdfsTable.load().

Do you know why we need to set reloadFileMetadata to true sometimes?


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2266
PS4, Line 2266:    * changing the row format the table metadata is marked as 
invalid and will be
I don't think this part is correct. If we return true the file metadata is 
reloaded as part of this alteration operation, and not when the table is 
accessed next.

The comment in alterTableSetFileFormat() is also wrong.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2267
PS4, Line 2267:    * reloaded on the next access.  The return value is true if 
the fileMetadata needs
Why would we even need to reload the file metadata?


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2285
PS4, Line 2285:         if (tbl instanceof HdfsTable)
style nit: we always use { } for multi-line if


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2290
PS4, Line 2290:         Preconditions.checkArgument(tbl instanceof HdfsTable);
This can go at the top if this function.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@471
PS4, Line 471:     AnalyzesOk("alter table functional_seq.alltypes set row 
format delimited " +
also tests for the PARTITION clause


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@473
PS4, Line 473:     AnalysisError("alter table functional_kudu.alltypes set row 
format delimited " +
test the PARTITION clause on an unpartitioned HDFS table


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@476
PS4, Line 476:     AnalysisError("alter table functional_parquet.alltypes set 
row format delimited " +
write these negative tests in a loop over a list of databases


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1338
PS4, Line 1338: # IMPALA-4323 - Test alter row format statement.  Ensure alter 
statement updates metadata.
We generally use this style:

# IMPALA-4323: <text>


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1339
PS4, Line 1339: create table $DATABASE.delimited (c1 string, c2 string, c3 
string)
no need to specify $DATABASE because this session is already using $DATABASE


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1347
PS4, Line 1347: ---- QUERY
Wow multiple QUERIES sections in a single test case work? I thought we needed 
to add "====" between test cases (QUERY and RESULTS sections). Are you sure 
these tests run as expected? Even if they do might be better to add the "====" 
for consistency.


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1348
PS4, Line 1348: alter table $DATABASE.delimited set row format delimited fields 
terminated by ' ';
Add a comment what we expect the row format to be after the alteration. We 
expect the line terminator to still be '\002' right?


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1351
PS4, Line 1351: 'the','quick\002brown','fox\002jumped'
Can we craft the test in a way that doesn't change how many fields are in a 
line? I kept wondering where the trailing tilde is, then I realized we have 4 
fields per line with the new delimiter. Just seems harder to reason about than 
necessary.


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1375
PS4, Line 1375: alter table $DATABASE.delimited partition (c0=1) set row format 
delimited fields terminated by ' ';
try altering the table-level row format and see how that affects the partitions 
and whether the behavior matches your expectations.



--
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 4
Gerrit-Owner: Adam Holley <[email protected]>
Gerrit-Reviewer: Adam Holley <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-Reviewer: Kim Jin Chul <[email protected]>
Gerrit-Comment-Date: Wed, 10 Jan 2018 01:10:26 +0000
Gerrit-HasComments: Yes

Reply via email to