Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
......................................................................


Patch Set 5:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@7
PS4, Line 7: Reduce i
> nit: Reduce
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@10
PS4, Line 10: HMS parameters map of partition objects. Each of these strings 
when
> nit: ... when stored in catalogd.
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@11
PS4, Line 11: ts th
> nit: take
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@17
PS4, Line 17: persist
> persistent?
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@26
PS4, Line 26: improves
> nit: improves
reminds me of my advisor suggesting edits to my thesis draft :D


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift@297
PS4, Line 297: byte[] representation of T
> still having trouble understanding this one. how about: "byte[] representat
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@283
PS4, Line 283: should i
> nit: should include (for consistency)
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@284
PS4, Line 284: and that
> nit: that is
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@302
PS4, Line 302: 'want_partition_stats' was set i
> sync this with the field name in that struct.
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@307
PS4, Line 307: 'want_partition_stats' in TTable
> sync
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a446
PS4, Line 446:
> any reason to get rid of these?
Didn't seem like they are any value add to debugging?


http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@337
PS3, Line 337:
> Do you recall where the '400' came from in the first place? Looking through
Yep, it was an empirical estimate based on some clusters with incremental 
stats. I'm adding a TODO to fix that.


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@334
PS4, Line 334: .setPartitio
> checkNotNull returns its argument, so you can inline this into the paramete
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@120
PS4, Line 120:    * are stored as a deflate-compressed byte array to reduce 
memory footprint. Use
> perhaps include the util method to use to turn it into a TPartitionStats?
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@158
PS4, Line 158:
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@527
PS4, Line 527:
             :   private boolean hasIncrementalStats_;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637
PS4, Line 637:     if (hasIncrStats) Preconditions.checkNotNull(partitionStats);
> worth a precondition check that, if hasIncrStats is true, then partitionSta
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637
PS4, Line 637:     if (hasIncrStats) Preconditions.checkNotNull(partitionStats);
> you mentioned on L527 that this should be set once-- perhaps enforce it?
Removed.


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@638
PS4, Line 638:     partitionStats_ = partitionStats;
> is there some sanity check here, for example, hasIncrStats => partitionStat
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@654
PS4, Line 654:       setPartitionStatsBytes(partitionStats, 
hasIncrStats.getRef());
> still think this is a surprising side effect for a method that looks like i
yea wanted to rename in the last PS, missed it somehow. Will do


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@61
PS4, Line 61:       byte[] compressedStats =  
part.getPartitionStatsCompressed();
> At a high level, I'm a little confused in this patch about the boolean hasI
Your understanding is right. This is a bug. I've spent some time on this and 
getting to understand the lifecycle of TTableStats in TPartitionStats. It can 
be simplified but I believe it is easier to just retain earlier behavior. 
Serialize everything to the partition state.


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@119
PS4, Line 119:           
CompressionUtil.deflateCompress(serializer.serialize(partStats));
> strange formatting
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@127
PS4, Line 127: ion.getTable().getFullName(), pa
> nit: why not put this substring as part of debugString instead of repeating
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@139
PS4, Line 139: rtition partition, Map<Strin
> what happens with the non-incremental stats in this case?
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@58
PS4, Line 58:   private final byte[] partitionStats_;
> this and below can be final
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@60
PS4, Line 60:
            :   private final boolean hasIncrementalStats_
> this clarification refers to the current case where we deserialize for each
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@48
PS4, Line 48: input bytes.", e)
> keep it more general here
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@61
PS4, Line 61: input bytes.", e);
> keep it general, and consistent with the error message for compressing.
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@35
PS4, Line 35:       byte[] compressed = 
CompressionUtil.deflateCompress(test.getBytes());
> include empty string/bytes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Aug 2018 02:05:39 +0000
Gerrit-HasComments: Yes

Reply via email to