Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670: Support multiple partitions in ALTER TABLE ADD 

Patch Set 8:


Flushing out some comments. Haven't looked at the tests yet.


PS8, Line 20: import java.util.List;
            : import java.util.Set;
nit: let's try to keep the import sorted if possible (move them below L 33)

PS8, Line 59: if (getDb() != null) {
            :       sb.append(getDb() + ".");
            :     }
single line

File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java:

PS8, Line 3: import java.util.List;
           : import java.util.Set;
           : import java.util.ArrayList;
           : import java.util.Collections;
           : import java.util.Comparator;
nit: unordered imports

PS8, Line 189: represantation
typo: representation

File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS8, Line 375: // If 'IfNotExists' is false and there is a conflict with the 
partitions that
             :           // already exist in Hive, then:
             :           // - leave existing partitions intact
             :           // - don't add any new partitions to the table
             :           // - throw ImpalaRuntimeException.
             :           // If 'IfNotExists' is true, conflicts are hadled as 
             :           // - If some (but not all) partitions already exist in 
Hive, then ignore
             :           // the partitions that already exist and add the rest.
             :           // - If all of the partitions already exist, ignore 
them and return null.
I believe this comment section is almost duplicate to the one in 
alterTableAddPartitions function. You can remove this comment block.

PS8, Line 1642: alterTableCachePartitions
can you swap this function with alterTableAddPartitions? (top down reading of 
the code usually helps :))

PS8, Line 1686: hmsPartitions
Will this try to alter all the partitions even though only one has a cache op 
specified? Also, again you're using hmsPartitions which is the result of 
add_partitions call. What if the partition exists in HMS, not in cache but the 
ALTER statement indicates that the added partition should be cached. I am not 
sure the end result will be the expected one in this case (i.e. the partition 
being added in catalog cache and a cache directive being added). Can you plz 
verify and/or add tests?

PS8, Line 1696: Hive
nit: replace Hive with HMS

PS8, Line 1718: hmsPartitionValues2CachingOp
An alternative for hmsPartitionValues2CachingOp would be partitionCachingOpMap.

PS8, Line 1718: ImmutableList
Why using ImmutableLists here? These variables are never exposed outside this 
function and there are no concurrent accesses here (you hold a lock on the tbl).

PS8, Line 1720:  
nit: extra space

Line 1727:       } else {
You may add a "continue;" at the end of the if block and skip the "else"

PS8, Line 1733: ImmutableList.copyOf(hmsPartition.getValues()
Partitions have unique ids, you don't need to use strings constructed from the 
partition values.

PS8, Line 1745: catch (AlreadyExistsException e) {
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, 
"add_partitions"), e);
              :       } catch (TException e) {
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, 
"add_partitions"), e);
              :       }
You don't really treat these two exceptions differently. Since Java 7, you may 
use a multi-catch to eliminate duplicate code:
catch (AlreadyExistsException | TException e) {...}. Also how about the other 
exceptions that his call may throw? Are only these two?

PS8, Line 1766: for (Partition partition: hmsPartitions) {
              :         // Create and add the HdfsPartition. Return the table 
object with an updated
              :         // catalog version.
              :         addHdfsPartition(tbl, partition);
              :       }
I think there is a scenario that may result in an inconsistent state. Say you 
want to add 3 partitions p1, p2 and p3 but someone (e.g. HIVE) has already 
added p2 without Impala knowing about it (i.e. it's not in the cache). What 
will happen with your code is that the result of add_partitions call in L1743 
will return only p1 and p3 (added partitions) and these are the partitions you 
will add to the catalog cache (L1766-1769). Even though the stmt will succeed 
Impala will be unaware of p2 and there is nothing to indicate what went wrong. 
The only option will be to run refresh or invalidate.

To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to