Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 

Patch Set 14:


Very nice! Minor nits in the code, some comments in testing.

File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS14, Line 1629: .
" or null if the table is not altered because all the partitions already exist 
and IF NOT EXISTS is specified".

PS14, Line 1635: If all partitions exist in
               :    * catalog cache, return null.

PS14, Line 1714: It is assumed that all Partition objects in 'aList' and 
'bList' belong to the same
               :    * table.
I don't think this function is making use or enforcing this assumption 
anywhere. So, you may want to remove this sentence.

PS14, Line 1753: lists of partition values
nit: maybe it's better to say "maps partitions (identified by their partition 
values) to their corresponding HDFS caching ops."

File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

PS14, Line 220: AnalysisError("alter table functional.alltypes add " + cl +
              :           " partition(year=2050, month=10)" +
              :           " partition(year=2050, month=11)" +
              :           " partition(Month=10, YEAR=2050) location" +
              :           " 
              :           "Duplicate partition spec: (month=10, year=2050)");
I think this case simply enforces the unique partition specs (similar to the 
one in L215). So I am not sure if LOCATION adds anything more in terms of test 
coverage in this case. Maybe just keep the previous case?

Line 246:     }
Also you may want to add a test case where one of the partitions is cached in a 
non-existent pool.

File tests/common/impala_test_suite.py:

PS14, Line 172: def impala_partitions(self, table_name, *include_fields):
A few thoughts on this function:
1. The field positions seem to be hardcoded based on the current output of SHOW 
PARTITIONS. Any change in the output will break the tests that rely on this 
2. "impala_partitions" doesn't really indicate what to expect from this 
function. Maybe "get_impala_partitions" or "get_impala_partition_info"  is more 
3. I find it awkward that by default it returns the partition values and all 
other fields are optional. Personally, I'd change it so that, if include_fields 
is not specified, return all fields. Otherwise, return only those fields in 

PS14, Line 174: Impala sees them
maybe: "returned by a SHOW PARTITIONS statement."

File tests/metadata/test_hms_integration.py:

Line 269:     Passes 'command' to 'engine' callable and makes sure that it 
raises an exception.
Thanks for adding the comment. Maybe an example of an 'engine' callable (in 
terms of an existing Python class that can be used) would also help future 
readers understand how to use this.

Line 701:             "Partition already exists")
You may want to add an assert here to verify that no partitions were added to 

PS14, Line 713: was left alone
Even partitions need to be left alone from time to time :). Maybe "was not 

PS14, Line 714: # Sometimes metadata load is triggered here, so compare only 
the first three
              :         # returned partitions.
Not sure I follow this comment. Can you explain the non-deterministic behavior 
wrt metadata load?

PS14, Line 770: def test_partitions_exist_after_refresh(self, vector):
I don't think this is an integration test. You may want to move it to 
test_partition_metadata.py and also cover the 'invalidate metadata' case. Also, 
another integration test case that is currently not covered is adding 
partitions using Impala and checking that they are accessible through Hive.

Line 800:             table_name).get_data().split('\n')
Also try to add some new and existing partitions using Impala (maybe with 
location and cached properties) and verify that new partitions are added 
correctly and existing partitions are not modified. You may want to test w and 

To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to