[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/559/ -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Hello Marcel Kornacker, Impala Public Jenkins, Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6406 to look at the new patch set (#10). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java 14 files changed, 572 insertions(+), 323 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/10 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 9: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/ -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/ -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 9: Code-Review+2 Forgot to push some change. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6406 to look at the new patch set (#9). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java 13 files changed, 560 insertions(+), 323 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/9 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/553/ -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 8: Code-Review+2 Rebase and fix minor test. Keep Marcel's +2. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6406 to look at the new patch set (#8). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java 13 files changed, 563 insertions(+), 323 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/8 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 6: (9 comments) http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 46: // the storage ID of a particular disk is unique across all the nodes in the cluster. > ? Oops, sorry missed that one. Done http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 112: ListMap hostIndex, Reference unknownDiskIds) > numUnknown...? Done Line 275:* File Block metadata > provide more information (such as the fact that it's used in conjunction wi Done Line 283: * Constructs the metadata of a file block from its block location metadata > "Constructs an FbFileBlock..."? Done Line 291: ListMap hostIndex, Reference unknownDiskIds) > numUnknown...? Done Line 305: boolean isReplicaCached = cachedHosts.contains(loc.getHosts()[i]); > doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), l Not in Guava, ArrayUtils is a class in Apache Commons. Line 327: * using 'fbb' and returns the offset in the underlying buffer where the encoded file > "in the underlying...starts": fine to leave out, that's implied by fb seman Agreed but maybe it's good to explicitly mention it for future readers that may not be familiar with FB semantics. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: Reference unknownDiskIds = new Reference(); > numUnknown...? Unfortunately not, generics don't work with primitive types. Miss C++ :)? Line 741:* Helper method to load the partition file metadata from scratch. This method is > is this from a rebase? Yes, this is from the performance improvements of REFRESH. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 46: private ConcurrentHashMapstorageUuidToDiskId = > add trailing _ to member vars ? http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 161: * Synthesizes the block metadata of a file represented by 'fileStatus' that resides > Done i meant: how does it "synthesize" it? e.g., by creating artificial blocks on some boundary. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 112: ListMap hostIndex, FileDescStats stats) throws IOException { numUnknown...? Line 275: } provide more information (such as the fact that it's used in conjunction with an FbFileBlock...) Line 283: String[] ip_port = location.split(":"); "Constructs an FbFileBlock..."? Line 291: numUnknown...? Line 305: // Use ~REPLICA_HOST_IDX_MASK to extract the cache info (stored in MSB). doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), loc.gethosts()[i])? Line 327: "in the underlying...starts": fine to leave out, that's implied by fb semantics. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: // Find the partition that this file belongs (if any). numUnknown...? does Reference also work? Line 741: HashMap partsByPath) { is this from a rebase? -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Hello Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6406 to look at the new patch set (#6). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java 13 files changed, 556 insertions(+), 320 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/6 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 5: (21 comments) http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: Line 31: LZ4 > zlib? Done Line 38: offset: long (id: 0); > why not specify a default, given that most will be 0? Done Line 51: disk_ids: [ushort] (id: 3); > is_cached? This is embedded in MSB of replica_host_idxs. http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 217: // File descriptor metadata serialized into a FlatBuffer. > mention where to find the fb definition Done http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 78: diskId = (short) (storageIdGenerator.get(host) + 1); > at least assert that you don't get an overflow here Done http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: Line 78: public byte toFlatBuffer() { > should we generally settle on toFb for functions like these? brevity is a v Done http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 115: int[] blockOffsets = createBlockMetadata(fbb, blockLocations, hostIndex, stats); > i couldn't tell from the name that this creates an array of fbs. call it so Done Line 126: public static FileDescriptor createWithSynthesizedBlockMetadata(FileStatus fileStatus, > feel free to generally abbreviate metadata as md, to make this very long na Done PS5, Line 141: blockOffsets > would also be good to name this so it's immediately clear what it is (ie, e Done Line 161: * Synthesizes the block metadata of a file represented by 'fileStatus' that resides > you don't say what that block metadata contains. Done Line 182: List replicas = Lists.newArrayList( > same comment about intermediate objects as below Done Line 208: List replicas = Lists.newArrayListWithExpectedSize( > hm, it would be nice not to have to construct these intermediate objects. p Done Line 224: public THdfsCompression getFileCompression() { > does this need to return a thrift struct? Done Line 231: public FileBlock getFileBlock(int idx) { > why not just return the fbfileblock and save yourself the object constructi Done Line 329: * Loads the metadata of a file block from its block location metadata 'location' and > what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb Done Line 346: replicaIdx = replica.isCached() ? > please describe this encoding in the .fb file. Done Line 360: FbFileBlock.addOffset(fbb, location.getOffset()); > why not just pass in the offset/length directly and avoid constructing a Bl Done Line 402: public boolean hasCachedReplica() { return hasCachedReplica_; } > instead of adding member vars, why not make all functions be static and tak Done Line 449: public static class FileDescStats { > since this only keeps track of a single number, consider using a Long inste Done http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 134: // if the table is stored in the catalog server. > what does that mean? i thought all tables are stored in the catalog server. The same class is used both in CatalogServiceCatalog (catalod) and ImpaladCatalog (impalad). However, when the table is stored in the catalog we don't need to populate data structure used for static partition pruning. I agree "stored in the catalog" is not the best description. http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 93: protected boolean storedInImpaladCatalogCache_ = false; > is this part of your changes? Yes, see previous comment in HdfsTable.java. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Alex Behm has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Henry Robinson has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 218: 1: required binary file_desc_data > Nice Leave a todo to put this in a KRPC sidecar :) No need to pay serialization for the string itself! -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Alex Behm has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: Line 36: offset: long = 0 (id: 0); Why a default value? Seems potentially dangerous. Not for this change, but I'm thinking we don't even need to store the offset. If we know the block size, than we can derive the offset (assuming the list of file blocks is ordered by offset). Might be worth adding a TODO or recording that idea somewhere. Line 40: length: long = -1 (id: 1); Seems redundant, why keep it? Line 65: compression: FbCompression (id: 3); Will FlatBuffers add padding to align members? Ideally, we'd optimize for space and not access speed. http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 218: 1: required binary file_desc_data Nice Line 296: 10: optional list file_name_prefixes Is this required for the move to flat buffers? I think we should consider an HdfsPathSet abstraction that assigns path ids and internally compresses the underlying strings. There's a lot of manual lookups and stitching in the current code. I don't feel too strongly about whether we should do that now, or clean up the code later. http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: Line 78: public byte toFbCompression() { toFlatBuffer()? http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 126: locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); I think it would be better for now to keep all the code that fetches information from external systems in one place (HdfsTable). Splitting up the loading and delegating to several classes may make sense, but that probably requires significant surgery, and the current fetching code is very much centralized (we iterate over all files of in a table). The loading code in this patch is more confusing to me than before. The meaning of some verbs like load/create is less clear. If you agree with that direction, we may not need a FileDescriptor class at all, and can only rely on the FB to hold the data. It may still make sense to have a FileDescBuilder which you can use to construct a FbFileDesc. Line 227: loc.getNames().length); Another case where we might be calling out to the NN. Line 321: private static int REPLICA_HOST_CACHE_MASK = 0x8000; Less code and more readable to have one var with (1 << 15). The places that need to & the mask can bit-wise invert, i.e. (x & ~MASK). http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 189: // List of file name prefixes. Sorted list of file name prefixes? Line 309: PaircompressedFileName = How predictable is the compression behavior? Do we iterate over the files in lexicographical order for both HDFS and S3? I'm worried about the case where an "invalidate metadata" suddenly leads to a higher memory requirement even if no files/partitions have changed. Line 340:* the suffix is equal to 'fileName'. Should mention that this function expects the fileNames to be sorted. Line 346: String commonPrefix = Strings.commonPrefix(prevFileName, fileName); I'm wondering if it's better to first check the last common prefix instead of the prefix between the current and prev file name. In the current version it seems like the list of prefixes could contain strings which are a common prefix of another one. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS2, Line 64: message(${JAVA_FE_ARGS}) > Why? Never mind, it looked to me like debug printing. Please ignore. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS2, Line 124: locations = locatedFileStatus.getBlockLocations(); : } else { : > If fileStatus is not an instance of LocatedFileStatus we don't have the blo Oops thanks for clarifying. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 648 insertions(+), 304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/3 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS2, Line 64: message(${JAVA_FE_ARGS}) > Looks like debug printing. Remove? (below too) Why? http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: PS2, Line 67: storageIdtoInt > (Unrelated to your change). Looks obsolete. Done http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 119: FlatBufferBuilder fbb = new FlatBufferBuilder(1); > Preconditions.checkState(isBlockMdSupported(fs))? Done PS2, Line 124: } else { : locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); : } > Do we really need this? Per my understanding, we only fetch LocatedFileStat If fileStatus is not an instance of LocatedFileStatus we don't have the block locations. E.g. during refresh. Line 141: ListMap hostIndex) { > Preconditions.checkState(!isBlockMdSupported(fs)) We don't have the filesystem here. PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx); > - replicaHostIdxs is [ushort] as per fbs definition in which case idx shoul In Java there is no unsigned short. Hence, ushort is stored as in int in flatbuffer java-generated code. I like the two masks as it makes the intention more explicit. Line 463: public static class FileDescStats { > For supportability purposes, I'm wondering if we should track the remote_ne That's a good point. I plan on augmenting this in a separate patch along with other useful stats. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 344: = > nit: spacing Done Line 365: private static String findCommonPrefix(String a, String b) { > Use Strings.commonPrefix() from guava? Good point. Didn't know this existed. Line 1933: Collections.sort(orderedFds, new Comparator() { > Can we just make FileDescriptor implement Comparator and ad How are we going to call decompressFileName that references a structure (file name prefixes) in HdfsTable if we do that? http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 93: protected boolean storedInImpaladCatalogCache_ = false; > What is the motivation to track this? A quick look at the places it is used To avoid initializing data structures that aren't used in the catalog. Tests also need to access this because of the different way we have for loading metadata. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS2, Line 64: message(${JAVA_FE_ARGS}) Looks like debug printing. Remove? (below too) http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: PS2, Line 67: storageIdtoInt (Unrelated to your change). Looks obsolete. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 119: FlatBufferBuilder fbb = new FlatBufferBuilder(1); Preconditions.checkState(isBlockMdSupported(fs))? PS2, Line 124: } else { : locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); : } Do we really need this? Per my understanding, we only fetch LocatedFileStatus'es going forward as we don't want to do another RPC to getFileBlockLocations(). In another words, is there some caller that requires this? Line 141: ListMap hostIndex) { Preconditions.checkState(!isBlockMdSupported(fs)) PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx); - replicaHostIdxs is [ushort] as per fbs definition in which case idx should be a short? That REPLICA_HOST_CACHE_MASK could be made made short and avoid typecasting in L361. - Also that way we could use only REPLICA_HOST_CACHE_MASK and use ~REPLICA_HOST_CACHE_MASK for unmasking instead of a separate REPLICA_HOST_IDX_MASK. Feel free to ignore this if you think that affects readability. Line 463: public static class FileDescStats { For supportability purposes, I'm wondering if we should track the remote_network_addresses and present a rolled up version of stats so that we can know which DNs to look at when we see -ve disk IDs. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 344: = nit: spacing Line 365: private static String findCommonPrefix(String a, String b) { Use Strings.commonPrefix() from guava? Line 1933: Collections.sort(orderedFds, new Comparator() { Can we just make FileDescriptor implement Comparator and add a compare() method there. I think that'd be cleaner. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 93: protected boolean storedInImpaladCatalogCache_ = false; What is the motivation to track this? A quick look at the places it is used, it seems like they are only called when it actually true (for example resetPartitions()). Are there callers when its false too? Like tests? -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 654 insertions(+), 301 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. > Any benchmark numbers? Just curious. Also I think it is good to add them he Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: > Trailing white spaces at multiple places in this file. Done Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) > Can this be moved outside the loop like JAVA_FE_ARGS Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: > extraneous white spaces. Done PS1, Line 54: file_name_prefix_idx > Didn't understand what this prefix means till I read HdfsTable.fileNamePref Done Line 70: root_type FbFileDesc; > What is the significance of this? I'm reading about FlatBuffers for the fir Yeah, it is mostly used if you read from JSON. Removed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); > Drive-by comment: using 'new' here is pessimising memory consumption. If yo Good point thanks. Done http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS1, Line 318: REPLICA_HOST_IDX_MASK > Looks like it used for UNMASKing, rename may be? It's a bit mask, so I'd rather keep the name as such :) Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; > Document these? Done PS1, Line 357: getHostIdx > how about just making hostIdx_ a short rather than typecasting everytime? Done PS1, Line 360: REPLICA_HOST_CACHE_MASK > Shouldn't this be short rather than int? May be it works fine this way, jus For convenience. Java short is signed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); > I don't think this FileDescriptor.create() can ever return null? If yes, wo Done Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' and returns a > I like the idea but I'm wondering if this kind of optimization is a little That's not entirely true. Multiple filenames from a single write do have a common prefix. Line 420: prefixCompressFileName(currentFileName, prevFileName); > Also I think it may not be optimal to construct the prefixes this way by co In principal you're right, this is not the best way to compress strings. However, due to the way these file names are generated (random) this is not meant to be a generic file name compressor. Here we take advantage of the pattern that file names of a single write exhibit to avoid storing the common prefix multiple times. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet SUPPORTED_TABLE_TYPES = EnumSet.of( > May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by Not sure how this ended up here. Removed. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. Any benchmark numbers? Just curious. Also I think it is good to add them here. http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: Trailing white spaces at multiple places in this file. Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) Can this be moved outside the loop like JAVA_FE_ARGS http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: extraneous white spaces. PS1, Line 54: file_name_prefix_idx Didn't understand what this prefix means till I read HdfsTable.fileNamePrefixes_. I think it should be pointed out here. Line 70: root_type FbFileDesc; What is the significance of this? I'm reading about FlatBuffers for the first time, so pardon my ignorance. From what I understand, this isn't required in strongly typed systems. Please correct me if I'm wrong. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; Document these? PS1, Line 318: REPLICA_HOST_IDX_MASK Looks like it used for UNMASKing, rename may be? PS1, Line 357: getHostIdx how about just making hostIdx_ a short rather than typecasting everytime? PS1, Line 360: REPLICA_HOST_CACHE_MASK Shouldn't this be short rather than int? May be it works fine this way, just want to be sure. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); I don't think this FileDescriptor.create() can ever return null? If yes, worth removing Preconditions check here and elsewhere. Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' and returns a I like the idea but I'm wondering if this kind of optimization is a little premature. Especially given the filenames generated by Impala writes are typically random UUID style strings and would likely not qualify this optimization. Thoughts? Please correct me if I got this wrong. Line 420: prefixCompressFileName(currentFileName, prevFileName); Also I think it may not be optimal to construct the prefixes this way by comparing the previous name with the current name. May be we should huffman style encoding that minimizes the no. of bits across the entire set of file names? http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet SUPPORTED_TABLE_TYPES = EnumSet.of( May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by making it public? -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); Drive-by comment: using 'new' here is pessimising memory consumption. If you use Short.valueof() or just let Java automatically box it, it will use a built-in cache for values -128 to 127, e.g. see http://docs.oracle.com/javase/7/docs/api/java/lang/Short.html#valueOf(short) -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6406 Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 653 insertions(+), 298 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis