Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata ......................................................................
Patch Set 3: (14 comments) http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: Line 36: offset: long = 0 (id: 0); > Why a default value? Seems potentially dangerous. Done Line 40: length: long = -1 (id: 1); > Seems redundant, why keep it? You mean why not compute it on the fly by iterating on the file blocks? Line 65: compression: FbCompression (id: 3); > Will FlatBuffers add padding to align members? Ideally, we'd optimize for s Hm, not sure I understand what the ask is here? Should I try to merge this with some other field or something else? http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 218: 1: required binary file_desc_data > Leave a todo to put this in a KRPC sidecar :) No need to pay serialization Done Line 218: 1: required binary file_desc_data > Nice Done Line 296: 10: optional list<string> file_name_prefixes > Is this required for the move to flat buffers? I think we should consider a No, this is definitely not needed now. Actually, I feel the current implementation is too flaky and not easy to explain. I'll remove it for now and leave a TODO. We can revisit the file name compression later. http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: Line 78: public byte toFbCompression() { > toFlatBuffer()? Done http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 126: locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); > I think it would be better for now to keep all the code that fetches inform Done Line 227: loc.getNames().length); > Another case where we might be calling out to the NN. I don't think any of these statements is making a call to NN. Which one are you talking about? Line 321: private static int REPLICA_HOST_CACHE_MASK = 0x8000; > Less code and more readable to have one var with (1 << 15). The places that Done http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 189: // List of file name prefixes. > Sorted list of file name prefixes? This was removed. Line 309: Pair<Integer, String> compressedFileName = > How predictable is the compression behavior? Do we iterate over the files i Removed. Line 340: * the suffix is equal to 'fileName'. > Should mention that this function expects the fileNames to be sorted. Removed. Line 346: String commonPrefix = Strings.commonPrefix(prevFileName, fileName); > I'm wondering if it's better to first check the last common prefix instead Removed. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
