[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 22:11:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. IMPALA-7424: Reduce in-memory footprint of incremental stats Currently incremental stats are stored as chunked Base64 strings in the HMS parameters map of partition objects. Each of these strings when stored in the catalogd are Java 'String' objects that use UTF-16 encoding and take up to 2 bytes per character. This patch converts the string representation into a deflate-compressed byte array form when the partition is loaded in the Catalogd and this state is maintained when transmitting them to the coordinators. To maintain backward compatibility, the persistent HMS representation of stats has not been modified. So the incremental stats are still written back to the chunked Base64 representation while serializing the partition state to HMS. On a real world catalogserver dominated by incremental stats memory footprint, this patch showed ~54% end-to-end heapsize reduction and ~79% reduction in the memory footprint of incremental stats data structures. This patch also improves the way the callers check if a partition has incremental stats by computing this information once and reusing it later. Without the patch, we deserialize the entire incremental stats structure everytime this information is needed and that triggers a spike in usage of working memory on catalogds/Impalads. Testing: Ran core tests on Catalog V1 Implementation. Ran some manual queries on Catalog V2 implementation. Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Reviewed-on: http://gerrit.cloudera.org:8080/11341 Reviewed-by: Impala Public Jenkins Reviewed-by: Todd Lipcon Tested-by: Impala Public Jenkins --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/CompressionUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java 17 files changed, 376 insertions(+), 127 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified Todd Lipcon: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 10 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 19:10:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 18:54:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 18:47:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 8: Code-Review+2 thx for the changes. lgtm. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 18:44:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11341 to look at the new patch set (#8). Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. IMPALA-7424: Reduce in-memory footprint of incremental stats Currently incremental stats are stored as chunked Base64 strings in the HMS parameters map of partition objects. Each of these strings when stored in the catalogd are Java 'String' objects that use UTF-16 encoding and take up to 2 bytes per character. This patch converts the string representation into a deflate-compressed byte array form when the partition is loaded in the Catalogd and this state is maintained when transmitting them to the coordinators. To maintain backward compatibility, the persistent HMS representation of stats has not been modified. So the incremental stats are still written back to the chunked Base64 representation while serializing the partition state to HMS. On a real world catalogserver dominated by incremental stats memory footprint, this patch showed ~54% end-to-end heapsize reduction and ~79% reduction in the memory footprint of incremental stats data structures. This patch also improves the way the callers check if a partition has incremental stats by computing this information once and reusing it later. Without the patch, we deserialize the entire incremental stats structure everytime this information is needed and that triggers a spike in usage of working memory on catalogds/Impalads. Testing: Ran core tests on Catalog V1 Implementation. Ran some manual queries on Catalog V2 implementation. Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/CompressionUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java 17 files changed, 376 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/8 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@530 PS7, Line 530: private boolean hasIncrementalStats_; > start this off at false. I could not easily see when its set (I see from th Java defaults it to false but yes setting it explicitly makes it more readable. http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690 PS7, Line 690: Preconditions.checkState( > agreed. just discussed this offline. when/if it gets written back, we can c Done http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712 PS6, Line 1712: > I think it should be included with metadata, not with the want_stats select Done http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@65 PS7, Line 65: decompressed > null on an internal exception in deflateDecompress Done http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@149 PS7, Line 149: decompressed > can be null if there's an exception in deflateDecompress. Done http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@64 PS7, Line 64: return null > why is this null in this case, but empty in the compress case (L49)? caller Ah oversight during refactoring. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 18:31:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690 PS7, Line 690: Preconditions.checkState( > good catch! Maybe we should clear these fields from the parameters even if agreed. just discussed this offline. when/if it gets written back, we can call it a self-cleaning feature. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 17:55:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690 PS7, Line 690: Preconditions.checkState( > what happens if the hms param was corrupted so the extraction on L665 threw good catch! Maybe we should clear these fields from the parameters even if it failed to decompress -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 17:51:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: (5 comments) couple of small comments/questions around edge-cases. otherwise, lgtm. http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@530 PS7, Line 530: private boolean hasIncrementalStats_; start this off at false. I could not easily see when its set (I see from thrift, but from the constructor, its less clear since the setter is in a try block, L667). http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@690 PS7, Line 690: Preconditions.checkState( what happens if the hms param was corrupted so the extraction on L665 threw an exception? that's not under our control. http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@65 PS7, Line 65: decompressed null on an internal exception in deflateDecompress http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@149 PS7, Line 149: decompressed can be null if there's an exception in deflateDecompress. http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/7/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@64 PS7, Line 64: return null why is this null in this case, but empty in the compress case (L49)? caller would be simplified to remove this. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 16:18:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/551/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 06:17:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: Code-Review+1 (1 comment) Just one remaining item, otherwise lgtm http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712 PS6, Line 1712: > Done I think it should be included with metadata, not with the want_stats selection. That way we can do the cheap call first to see which ones have stats. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 05:59:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/11341/6/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/11341/6/common/thrift/CatalogObjects.thrift@303 PS6, Line 303: optional > shouldn't this be optional since we don't need to send this to backends as Done http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@322 PS6, Line 322: thriftHdfsPart.setId(part.getId()); > per comment elsewhere, I dont think you need this in the non-FULL (descript Yep missed this. http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@32 PS6, Line 32: import org.apache.hadoop.fs.BlockLocation; > nit: this should sort down with the other guava stuff Done http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@668 PS6, Line 668: // Delete the incremental stats entries. > I think you could simplify this code a bit and write: neat, done! http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@693 PS6, Line 693: > nit: typo Done http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1712 PS6, Line 1712: > it's somewhat weird that this one is required. I would include it optionall Done http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/6/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@43 PS6, Line 43: > curious if you compared the compression ratio and performance of BEST_COMPR I noticed that BEST_COMPRESSION had a slightly better (not too much) compression ratio. I don't have the numbers now. Adding a TODO though. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 05:31:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11341 to look at the new patch set (#7). Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. IMPALA-7424: Reduce in-memory footprint of incremental stats Currently incremental stats are stored as chunked Base64 strings in the HMS parameters map of partition objects. Each of these strings when stored in the catalogd are Java 'String' objects that use UTF-16 encoding and take up to 2 bytes per character. This patch converts the string representation into a deflate-compressed byte array form when the partition is loaded in the Catalogd and this state is maintained when transmitting them to the coordinators. To maintain backward compatibility, the persistent HMS representation of stats has not been modified. So the incremental stats are still written back to the chunked Base64 representation while serializing the partition state to HMS. On a real world catalogserver dominated by incremental stats memory footprint, this patch showed ~54% end-to-end heapsize reduction and ~79% reduction in the memory footprint of incremental stats data structures. This patch also improves the way the callers check if a partition has incremental stats by computing this information once and reusing it later. Without the patch, we deserialize the entire incremental stats structure everytime this information is needed and that triggers a spike in usage of working memory on catalogds/Impalads. Testing: Ran core tests on Catalog V1 Implementation. Ran some manual queries on Catalog V2 implementation. Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/CompressionUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java 17 files changed, 360 insertions(+), 128 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/7 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 03:58:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG@14 PS5, Line 14: deflate > gzip is not present in this change, so either state 'deflate-compressed' or Done http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift@302 PS5, Line 302: // Set if 'want_partition_stats' was set in TTableInfoSelector. Not set if the > and the partition has associated stats. just to disambiguate the case where Clarified. Fixed a related bug in FeCatalogUtils http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@160 PS5, Line 160: titionSt > I think you're using this term since the prev. method was called getFiltere Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524 PS5, Line 524: > and when its updated (from CatalogOpExecutor, through the Util to serialize Added a reference to setPartitionStatsBytes(). Didn't want to make it complex by adding all the callpaths. Interested ones can track it back using setPartitionStatsBytes()'s callers. http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@76 PS5, Line 76: - > nit: add the hyphen Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@109 PS5, Line 109: HdfsP > move this to the method javadoc. also mention the deletion on L122 Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@123 PS5, Line 123: String debugString = > this was somewhat surprising at first... was wondering why the deletion if Its a no-op. Added more checks for this invariant as discussed. http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@128 PS5, Line 128: > this is redundant with the prefix in L126 Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@129 PS5, Line 129: > same here Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@819 PS5, Line 819: pri > Will remove in next PS Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@40 PS5, Line 40: if (input == null) return null; > precondition that input is not-null or return null if that's the case. same Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@59 PS5, Line 59: try{ > make these more symmetric. the other one returns in the try-block, this one Done http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@32 PS5, Line 32: si > Will update in next PS Done -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 31 Aug 2018 03:15:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11341 to look at the new patch set (#6). Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. IMPALA-7424: Reduce in-memory footprint of incremental stats Currently incremental stats are stored as chunked Base64 strings in the HMS parameters map of partition objects. Each of these strings when stored in the catalogd are Java 'String' objects that use UTF-16 encoding and take up to 2 bytes per character. This patch converts the string representation into a deflate-compressed byte array form when the partition is loaded in the Catalogd and this state is maintained when transmitting them to the coordinators. To maintain backward compatibility, the persistent HMS representation of stats has not been modified. So the incremental stats are still written back to the chunked Base64 representation while serializing the partition state to HMS. On a real world catalogserver dominated by incremental stats memory footprint, this patch showed ~54% end-to-end heapsize reduction and ~79% reduction in the memory footprint of incremental stats data structures. This patch also improves the way the callers check if a partition has incremental stats by computing this information once and reusing it later. Without the patch, we deserialize the entire incremental stats structure everytime this information is needed and that triggers a spike in usage of working memory on catalogds/Impalads. Testing: Ran core tests on Catalog V1 Implementation. Ran some manual queries on Catalog V2 implementation. Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/CompressionUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java 17 files changed, 368 insertions(+), 131 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/6 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11341/5//COMMIT_MSG@14 PS5, Line 14: gzipped gzip is not present in this change, so either state 'deflate-compressed' or just 'compressed'. http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11341/5/common/thrift/CatalogService.thrift@302 PS5, Line 302: // Set if 'want_partition_stats' was set in TTableInfoSelector. and the partition has associated stats. just to disambiguate the case where stats are not present-- is this set empty or unset? http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@160 PS5, Line 160: filtered I think you're using this term since the prev. method was called getFilteredHmsParameters. simpler to just say that the keys used to store TPartitionStats are not returned. http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@524 PS5, Line 524: // when the partition is loaded. and when its updated (from CatalogOpExecutor, through the Util to serialize, and back to setPartitionStatsBytes) http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@76 PS5, Line 76: nit: add the hyphen http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@109 PS5, Line 109: // null move this to the method javadoc. also mention the deletion on L122 http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@123 PS5, Line 123: deletePartStats(partition); this was somewhat surprising at first... was wondering why the deletion if TPartStats is coming in from the caller. makes more sense to me when I see that its an overwrite and we're making TPartStats the master copy. I think just comments will clarify. http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@128 PS5, Line 128: Error saving partition stats: this is redundant with the prefix in L126 http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@129 PS5, Line 129: rror saving partitio same here http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@40 PS5, Line 40: ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length); precondition that input is not-null or return null if that's the case. same for the other method. http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@59 PS5, Line 59: IOUtils.copy(new InflaterInputStream(new ByteArrayInputStream(input)), out); make these more symmetric. the other one returns in the try-block, this one doesn't. can make it easier to see that a null is returned for similar cases. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 30 Aug 2018 17:50:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/532/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 30 Aug 2018 02:46:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@819 PS5, Line 819: p Will remove in next PS http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java: http://gerrit.cloudera.org:8080/#/c/11341/5/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@32 PS5, Line 32: a Will update in next PS -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 30 Aug 2018 02:09:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 ) Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. Patch Set 5: (29 comments) http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@7 PS4, Line 7: Reduce i > nit: Reduce Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@10 PS4, Line 10: HMS parameters map of partition objects. Each of these strings when > nit: ... when stored in catalogd. Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@11 PS4, Line 11: ts th > nit: take Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@17 PS4, Line 17: persist > persistent? Done http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@26 PS4, Line 26: improves > nit: improves reminds me of my advisor suggesting edits to my thesis draft :D http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift@297 PS4, Line 297: byte[] representation of T > still having trouble understanding this one. how about: "byte[] representat Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@283 PS4, Line 283: should i > nit: should include (for consistency) Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@284 PS4, Line 284: and that > nit: that is Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@302 PS4, Line 302: 'want_partition_stats' was set i > sync this with the field name in that struct. Done http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@307 PS4, Line 307: 'want_partition_stats' in TTable > sync Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a446 PS4, Line 446: > any reason to get rid of these? Didn't seem like they are any value add to debugging? http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11341/3/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@337 PS3, Line 337: > Do you recall where the '400' came from in the first place? Looking through Yep, it was an empirical estimate based on some clusters with incremental stats. I'm adding a TODO to fix that. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@334 PS4, Line 334: .setPartitio > checkNotNull returns its argument, so you can inline this into the paramete Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@120 PS4, Line 120:* are stored as a deflate-compressed byte array to reduce memory footprint. Use > perhaps include the util method to use to turn it into a TPartitionStats? Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@158 PS4, Line 158: > nit: remove Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@527 PS4, Line 527: : private boolean hasIncrementalStats_; > remove Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637 PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats); > worth a precondition check that, if hasIncrStats is true, then partitionSta Done http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637 PS4, Line 637: if (hasIncrStats) Preconditions.checkNotNull(partitionStats); > you mentioned on L527 that this should be set once-- perhaps enforce it? Removed.
[Impala-ASF-CR] IMPALA-7424: Reduce in-memory footprint of incremental stats
Hello Todd Lipcon, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11341 to look at the new patch set (#5). Change subject: IMPALA-7424: Reduce in-memory footprint of incremental stats .. IMPALA-7424: Reduce in-memory footprint of incremental stats Currently incremental stats are stored as chunked Base64 strings in the HMS parameters map of partition objects. Each of these strings when stored in the catalogd are Java 'String' objects that use UTF-16 encoding and take up to 2 bytes per character. This patch converts the string representation into a gzipped byte array form when the partition is loaded in the Catalogd and this state is maintained when transmitting them to the coordinators. To maintain backward compatibility, the persistent HMS representation of stats has not been modified. So the incremental stats are still written back to the chunked Base64 representation while serializing the partition state to HMS. On a real world catalogserver dominated by incremental stats memory footprint, this patch showed ~54% end-to-end heapsize reduction and ~79% reduction in the memory footprint of incremental stats data structures. This patch also improves the way the callers check if a partition has incremental stats by computing this information once and reusing it later. Without the patch, we deserialize the entire incremental stats structure everytime this information is needed and that triggers a spike in usage of working memory on catalogds/Impalads. Testing: Ran core tests on Catalog V1 Implementation. Ran some manual queries on Catalog V2 implementation. Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/CompressionUtil.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java A fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java 17 files changed, 333 insertions(+), 117 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/11341/5 -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac