Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 )
Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats ...................................................................... Patch Set 5: (29 comments) http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@7 PS4, Line 7: Reduce i > nit: Reduce Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@10 PS4, Line 10: HMS parameters map of partition objects. Each of these strings when > nit: ... when stored in catalogd. Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@11 PS4, Line 11: ts th > nit: take Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@17 PS4, Line 17: persist > persistent? Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@26 PS4, Line 26: improves > nit: improves reminds me of my advisor suggesting edits to my thesis draft :D http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift@297 PS4, Line 297: byte[] representation of T > still having trouble understanding this one. how about: "byte[] representat Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@283 PS4, Line 283: should i > nit: should include (for consistency) Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@284 PS4, Line 284: and that > nit: that is Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@302 PS4, Line 302: 'want_partition_stats' was set i > sync this with the field name in that struct. Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@307 PS4, Line 307: 'want_partition_stats' in TTable > sync Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a446 PS4, Line 446: > any reason to get rid of these? Didn't seem like they are any value add to debugging? http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@337 PS3, Line 337: > Do you recall where the '400' came from in the first place? Looking through Yep, it was an empirical estimate based on some clusters with incremental stats. I'm adding a TODO to fix that. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@334 PS4, Line 334: .setPartitio > checkNotNull returns its argument, so you can inline this into the paramete Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@120 PS4, Line 120: * are stored as a deflate-compressed byte array to reduce memory footprint. Use > perhaps include the util method to use to turn it into a TPartitionStats? Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@158 PS4, Line 158: > nit: remove Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@527 PS4, Line 527: : private boolean hasIncrementalStats_; > remove Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637 PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats); > worth a precondition check that, if hasIncrStats is true, then partitionSta Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637 PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats); > you mentioned on L527 that this should be set once-- perhaps enforce it? Removed. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@638 PS4, Line 638: partitionStats_ = partitionStats; > is there some sanity check here, for example, hasIncrStats => partitionStat Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@654 PS4, Line 654: setPartitionStatsBytes(partitionStats, hasIncrStats.getRef()); > still think this is a surprising side effect for a method that looks like i yea wanted to rename in the last PS, missed it somehow. Will do http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@61 PS4, Line 61: byte[] compressedStats = part.getPartitionStatsCompressed(); > At a high level, I'm a little confused in this patch about the boolean hasI Your understanding is right. This is a bug. I've spent some time on this and getting to understand the lifecycle of TTableStats in TPartitionStats. It can be simplified but I believe it is easier to just retain earlier behavior. Serialize everything to the partition state. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@119 PS4, Line 119: CompressionUtil.deflateCompress(serializer.serialize(partStats)); > strange formatting Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@127 PS4, Line 127: ion.getTable().getFullName(), pa > nit: why not put this substring as part of debugString instead of repeating Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@139 PS4, Line 139: rtition partition, Map<Strin > what happens with the non-incremental stats in this case? Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@58 PS4, Line 58: private final byte[] partitionStats_; > this and below can be final Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@60 PS4, Line 60: : private final boolean hasIncrementalStats_ > this clarification refers to the current case where we deserialize for each Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@48 PS4, Line 48: input bytes.", e) > keep it more general here Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@61 PS4, Line 61: input bytes.", e); > keep it general, and consistent with the error message for compressing. Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@35 PS4, Line 35: byte[] compressed = CompressionUtil.deflateCompress(test.getBytes()); > include empty string/bytes. Done -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Thu, 30 Aug 2018 02:05:39 +0000 Gerrit-HasComments: Yes