[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 22:11:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a deflate-compressed byte
array form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Reviewed-on: http://gerrit.cloudera.org:8080/11341
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Todd Lipcon 
Tested-by: Impala Public Jenkins 
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/CompressionUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
17 files changed, 376 insertions(+), 127 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 10
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/556/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 19:10:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 18:54:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 18:47:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 8: Code-Review+2

thx for the changes. lgtm.


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 18:44:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Bharath Vissapragada (Code Review)
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11341

to look at the new patch set (#8).

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a deflate-compressed byte
array form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/CompressionUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
17 files changed, 376 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/8
--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@530
PS7, Line 530:   private boolean hasIncrementalStats_;
> start this off at false. I could not easily see when its set (I see from th
Java defaults it to false but yes setting it explicitly makes it more readable.


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690
PS7, Line 690: Preconditions.checkState(
> agreed. just discussed this offline. when/if it gets written back, we can c
Done


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712
PS6, Line 1712:
> I think it should be included with metadata, not with the want_stats select
Done


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@65
PS7, Line 65: decompressed
> null on an internal exception in deflateDecompress
Done


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@149
PS7, Line 149: decompressed
> can be null if there's an exception in deflateDecompress.
Done


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@64
PS7, Line 64: return null
> why is this null in this case, but empty in the compress case (L49)? caller
Ah oversight during refactoring.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 18:31:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690
PS7, Line 690: Preconditions.checkState(
> good catch! Maybe we should clear these fields from the parameters even if
agreed. just discussed this offline. when/if it gets written back, we can call 
it a self-cleaning feature.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 17:55:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690
PS7, Line 690: Preconditions.checkState(
> what happens if the hms param was corrupted so the extraction on L665 threw
good catch! Maybe we should clear these fields from the parameters even if it 
failed to decompress



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 17:51:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

(5 comments)

couple of small comments/questions around edge-cases. otherwise, lgtm.

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@530
PS7, Line 530:   private boolean hasIncrementalStats_;
start this off at false. I could not easily see when its set (I see from 
thrift, but from the constructor, its less clear since the setter is in a try 
block, L667).


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690
PS7, Line 690: Preconditions.checkState(
what happens if the hms param was corrupted so the extraction on L665 threw an 
exception? that's not under our control.


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@65
PS7, Line 65: decompressed
null on an internal exception in deflateDecompress


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@149
PS7, Line 149: decompressed
can be null if there's an exception in deflateDecompress.


http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@64
PS7, Line 64: return null
why is this null in this case, but empty in the compress case (L49)? caller 
would be simplified to remove this.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 16:18:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/551/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 06:17:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7: Code-Review+1

(1 comment)

Just one remaining item, otherwise lgtm

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712
PS6, Line 1712:
> Done
I think it should be included with metadata, not with the want_stats selection. 
That way we can do the cheap call first to see which ones have stats.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 05:59:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11341/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/6/common/thrift/CatalogObjects.thrift@303
PS6, Line 303: optional
> shouldn't this be optional since we don't need to send this to backends as
Done


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@322
PS6, Line 322: thriftHdfsPart.setId(part.getId());
> per comment elsewhere, I dont think you need this in the non-FULL (descript
Yep missed this.


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@32
PS6, Line 32: import org.apache.hadoop.fs.BlockLocation;
> nit: this should sort down with the other guava stuff
Done


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@668
PS6, Line 668:   // Delete the incremental stats entries.
> I think you could simplify this code a bit and write:
neat, done!


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@693
PS6, Line 693: 
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712
PS6, Line 1712:
> it's somewhat weird that this one is required. I would include it optionall
Done


http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@43
PS6, Line 43:
> curious if you compared the compression ratio and performance of BEST_COMPR
I noticed that BEST_COMPRESSION had a slightly better (not too much)  
compression ratio. I don't have the numbers now. Adding a TODO though.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 05:31:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Bharath Vissapragada (Code Review)
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11341

to look at the new patch set (#7).

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a deflate-compressed byte
array form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/CompressionUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
17 files changed, 360 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/7
--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/549/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 03:58:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG@14
PS5, Line 14: deflate
> gzip is not present in this change, so either state 'deflate-compressed' or
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift@302
PS5, Line 302:   // Set if 'want_partition_stats' was set in 
TTableInfoSelector. Not set if the
> and the partition has associated stats. just to disambiguate the case where
Clarified. Fixed a related bug in FeCatalogUtils


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@160
PS5, Line 160: titionSt
> I think you're using this term since the prev. method was called getFiltere
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524
PS5, Line 524:
> and when its updated (from CatalogOpExecutor, through the Util to serialize
Added a reference to setPartitionStatsBytes(). Didn't want to make it complex 
by adding all the callpaths. Interested ones can track it back using 
setPartitionStatsBytes()'s callers.


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@76
PS5, Line 76: -
> nit: add the hyphen
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@109
PS5, Line 109:   HdfsP
> move this to the method javadoc. also mention the deletion on L122
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@123
PS5, Line 123: String debugString =
> this was somewhat surprising at first... was wondering why the deletion if
Its a no-op. Added more checks for this invariant as discussed.


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@128
PS5, Line 128: 
> this is redundant with the prefix in L126
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@129
PS5, Line 129:
> same here
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@819
PS5, Line 819: pri
> Will remove in next PS
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@40
PS5, Line 40: if (input == null) return null;
> precondition that input is not-null or return null if that's the case. same
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@59
PS5, Line 59: try{
> make these more symmetric. the other one returns in the try-block, this one
Done


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@32
PS5, Line 32:  si
> Will update in next PS
Done



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 31 Aug 2018 03:15:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Bharath Vissapragada (Code Review)
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11341

to look at the new patch set (#6).

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a deflate-compressed byte
array form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/CompressionUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
17 files changed, 368 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/6
--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-30 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG@14
PS5, Line 14: gzipped
gzip is not present in this change, so either state 'deflate-compressed' or 
just 'compressed'.


http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift@302
PS5, Line 302:   // Set if 'want_partition_stats' was set in TTableInfoSelector.
and the partition has associated stats. just to disambiguate the case where 
stats are not present-- is this set empty or unset?


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@160
PS5, Line 160: filtered
I think you're using this term since the prev. method was called 
getFilteredHmsParameters. simpler to just say that the keys used to store 
TPartitionStats are not returned.


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524
PS5, Line 524:   // when the partition is loaded.
and when its updated (from CatalogOpExecutor, through the Util to serialize, 
and back to setPartitionStatsBytes)


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@76
PS5, Line 76:
nit: add the hyphen


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@109
PS5, Line 109: // null
move this to the method javadoc. also mention the deletion on L122


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@123
PS5, Line 123: deletePartStats(partition);
this was somewhat surprising at first... was wondering why the deletion if 
TPartStats is coming in from the caller. makes more sense to me when I see that 
its an overwrite and we're making TPartStats the master copy. I think just 
comments will clarify.


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@128
PS5, Line 128: Error saving partition stats:
this is redundant with the prefix in L126


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@129
PS5, Line 129: rror saving partitio
same here


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
File fe/src/main/java/org/apache/impala/util/CompressionUtil.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@40
PS5, Line 40: ByteArrayOutputStream bos = new 
ByteArrayOutputStream(input.length);
precondition that input is not-null or return null if that's the case. same for 
the other method.


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@59
PS5, Line 59:   IOUtils.copy(new InflaterInputStream(new 
ByteArrayInputStream(input)), out);
make these more symmetric. the other one returns in the try-block, this one 
doesn't. can make it easier to see that a null is returned for similar cases.



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 30 Aug 2018 17:50:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/532/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 30 Aug 2018 02:46:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-29 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@819
PS5, Line 819:   p
Will remove in next PS


http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java:

http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@32
PS5, Line 32:  a
Will update in next PS



--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 30 Aug 2018 02:09:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-29 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11341 )

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..


Patch Set 5:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@7
PS4, Line 7: Reduce i
> nit: Reduce
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@10
PS4, Line 10: HMS parameters map of partition objects. Each of these strings 
when
> nit: ... when stored in catalogd.
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@11
PS4, Line 11: ts th
> nit: take
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@17
PS4, Line 17: persist
> persistent?
Done


http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@26
PS4, Line 26: improves
> nit: improves
reminds me of my advisor suggesting edits to my thesis draft :D


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift@297
PS4, Line 297: byte[] representation of T
> still having trouble understanding this one. how about: "byte[] representat
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@283
PS4, Line 283: should i
> nit: should include (for consistency)
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@284
PS4, Line 284: and that
> nit: that is
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@302
PS4, Line 302: 'want_partition_stats' was set i
> sync this with the field name in that struct.
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@307
PS4, Line 307: 'want_partition_stats' in TTable
> sync
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a446
PS4, Line 446:
> any reason to get rid of these?
Didn't seem like they are any value add to debugging?


http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@337
PS3, Line 337:
> Do you recall where the '400' came from in the first place? Looking through
Yep, it was an empirical estimate based on some clusters with incremental 
stats. I'm adding a TODO to fix that.


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@334
PS4, Line 334: .setPartitio
> checkNotNull returns its argument, so you can inline this into the paramete
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@120
PS4, Line 120:* are stored as a deflate-compressed byte array to reduce 
memory footprint. Use
> perhaps include the util method to use to turn it into a TPartitionStats?
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@158
PS4, Line 158:
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@527
PS4, Line 527:
 :   private boolean hasIncrementalStats_;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637
PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats);
> worth a precondition check that, if hasIncrStats is true, then partitionSta
Done


http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637
PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats);
> you mentioned on L527 that this should be set once-- perhaps enforce it?
Removed.



[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats

2018-08-29 Thread Bharath Vissapragada (Code Review)
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11341

to look at the new patch set (#5).

Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats
..

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a gzipped byte array
form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/CompressionUtil.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java
17 files changed, 333 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/5
--
To view, visit http://gerrit.cloudera.org:8080/11341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Gerrit-Change-Number: 11341
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac