Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
Any benchmark numbers? Just curious. Also I think it is good to add them here.


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
Trailing white spaces at multiple places in this file.


Line 45:     set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
Can this be moved outside the loop like JAVA_FE_ARGS


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
extraneous white spaces.


PS1, Line 54: file_name_prefix_idx
Didn't understand what this prefix means till I read 
HdfsTable.fileNamePrefixes_. I think it should be pointed out here.


Line 70: root_type FbFileDesc;
What is the significance of this? I'm reading about FlatBuffers for the first 
time, so pardon my ignorance. From what I understand, this isn't required in 
strongly typed systems. Please correct me if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 318:     private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
Document these?


PS1, Line 318: REPLICA_HOST_IDX_MASK
Looks like it used for UNMASKing, rename may be?


PS1, Line 357: getHostIdx
how about just making hostIdx_ a short rather than typecasting everytime?


PS1, Line 360: REPLICA_HOST_CACHE_MASK
Shouldn't this be short rather than int? May be it works fine this way, just 
want to be sure.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315:         Preconditions.checkNotNull(fd);
I don't think this FileDescriptor.create() can ever return null? If yes, worth 
removing Preconditions check here and elsewhere.


Line 338:    * Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
I like the idea but I'm wondering if this kind of optimization is a little 
premature. Especially given the filenames generated by Impala writes are 
typically random UUID style strings and would likely not qualify this 
optimization. Thoughts? Please correct me if I got this wrong.


Line 420:           prefixCompressFileName(currentFileName, prevFileName);
Also I think it may not be optimal to construct the prefixes this way by 
comparing the previous name with the current name. May be we should huffman 
style encoding that minimizes the no. of bits across the entire set of file 
names?


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = 
EnumSet.of(
May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by 
making it public?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to