[ https://issues.apache.org/jira/browse/KYLIN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15565404#comment-15565404 ]
Dayue Gao commented on KYLIN-2083: ---------------------------------- Play with it for a while, refactored AggregationCacheMemSizeTest * use [jamm|https://github.com/jbellis/jamm] to replace the previous way for obtaining object's actual heap usage * move estimation test for individual aggregators to AggregatorMemEstimateTest * test different setups for aggregation cache * test different setups for bitmap aggregator * test both +UseCompressedOops and -UseCompressedOops Below is how to run the test and what I've found. Group 1: CompressedOops Enabled -------------------------------------------------- {noformat} $ mvn test -Dtest=AggregationCacheMemSizeTest#testEstimateMemSize -pl 'core-cube' -DargLine='-Xms2g -Xmx2g' -Dscale=10 {noformat} 1)WITHOUT_MEM_HUNGRY:contain three basic aggregators: longSum, doubleSum and bigdecimalSum {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 100,000 32,400,000 31,200,080 1 1,174 200,000 64,800,000 62,400,080 1 2,899 300,000 97,200,000 93,600,080 1 5,779 400,000 129,600,000 124,800,080 1 9,338 500,000 162,000,000 156,000,080 1 13,547 600,000 194,400,000 187,200,080 1 19,555 700,000 226,800,000 218,400,080 1 26,240 800,000 259,200,000 249,600,080 1 33,895 900,000 291,600,000 280,800,080 1 42,416 1,000,000 324,000,000 312,000,080 1 50,853 {noformat} 2) WITH_HLLC: contain three basic aggregators and one HyperLogLog(14) aggregator {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 5,000 83,840,000 83,840,096 0 51 10,000 167,680,000 167,680,096 0 148 15,000 251,520,000 251,520,096 0 303 20,000 335,360,000 335,360,096 0 486 25,000 419,200,000 419,200,096 0 717 30,000 503,040,000 503,040,096 0 1,008 35,000 586,880,000 586,880,096 0 1,334 40,000 670,720,000 670,720,096 0 1,711 45,000 754,560,000 754,560,096 0 2,120 50,000 838,400,000 838,400,096 0 2,648 {noformat} 3) WITH_LOW_CARD_BITMAP: contain three basic aggregators and one sparse bitmap aggregator (1 million bits but only 100 bits on). {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 10,000 5,920,000 23,200,080 1 452 20,000 11,840,000 46,400,080 1 1,330 30,000 17,760,000 69,600,080 1 2,716 40,000 23,680,000 92,800,080 1 4,531 50,000 29,600,000 116,000,080 1 6,973 60,000 35,520,000 139,200,080 1 9,915 70,000 41,440,000 162,400,080 1 13,289 80,000 47,360,000 185,600,080 1 17,037 90,000 53,280,000 208,800,080 1 21,923 100,000 59,200,000 232,000,080 1 28,140 {noformat} 4) WITH_HIGH_CARD_BITMAP: contain three basic aggregators and one dense bitmap aggregator (1 million bits, 99.99% on) {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 1,000 131,464,000 133,096,080 0 49 2,000 262,928,000 266,192,080 0 138 3,000 394,392,000 399,288,080 0 319 4,000 525,856,000 532,384,080 0 503 5,000 657,320,000 665,480,080 0 739 6,000 788,784,000 798,576,080 0 1,101 7,000 920,248,000 931,672,080 0 1,473 8,000 1,051,712,000 1,064,768,080 0 1,895 9,000 1,183,176,000 1,197,864,080 0 2,311 10,000 1,314,640,000 1,330,960,080 0 2,969 {noformat} Group 2: CompressedOops Disabled -------------------------------------------------- {noformat} $ mvn test -Dtest=AggregationCacheMemSizeTest#testEstimateMemSize -pl 'core-cube' -DargLine='-Xms4g -Xmx4g -XX:-UseCompressedOops' -Dscale=10 {noformat} 1)WITHOUT_MEM_HUNGRY:contain three basic aggregators: longSum, doubleSum and bigdecimalSum {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 100,000 32,400,000 40,800,120 1 1,568 200,000 64,800,000 81,600,120 1 3,667 300,000 97,200,000 122,400,120 1 6,940 400,000 129,600,000 163,200,120 1 11,375 500,000 162,000,000 204,000,120 1 16,953 600,000 194,400,000 244,800,120 1 24,452 700,000 226,800,000 285,600,120 1 32,738 800,000 259,200,000 326,400,120 1 41,785 900,000 291,600,000 367,200,120 1 54,307 1,000,000 324,000,000 408,000,120 1 64,795 {noformat} 2) WITH_HLLC: contain three basic aggregators and one HyperLogLog(14) aggregator {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 5,000 83,840,000 84,520,144 0 55 10,000 167,680,000 169,040,144 0 170 15,000 251,520,000 253,560,144 0 346 20,000 335,360,000 338,080,144 0 602 25,000 419,200,000 422,600,144 0 939 30,000 503,040,000 507,120,144 0 1,341 35,000 586,880,000 591,640,144 0 1,765 40,000 670,720,000 676,160,144 0 2,214 45,000 754,560,000 760,680,144 0 2,688 50,000 838,400,000 845,200,144 0 3,334 {noformat} 3) WITH_LOW_CARD_BITMAP: contain three basic aggregators and one sparse bitmap aggregator (1 million bits but only 100 bits on). {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 10,000 5,880,000 28,240,120 0 515 20,000 11,760,000 56,480,120 0 1,599 30,000 17,640,000 84,720,120 0 3,286 40,000 23,520,000 112,960,120 0 5,528 50,000 29,400,000 141,200,120 0 8,530 60,000 35,280,000 169,440,120 0 12,170 70,000 41,160,000 197,680,120 0 16,241 80,000 47,040,000 225,920,120 0 20,841 90,000 52,920,000 254,160,120 0 25,741 100,000 58,800,000 282,400,120 0 32,102 {noformat} 4) WITH_HIGH_CARD_BITMAP: contain three basic aggregators and one dense bitmap aggregator (1 million bits, 99.99% on) {noformat} Size Estimate(bytes) Actual(bytes) Estimate(ms) Actual(ms) 1,000 131,464,000 133,760,120 0 54 2,000 262,928,000 267,520,120 0 178 3,000 394,392,000 401,280,120 0 449 4,000 525,856,000 535,040,120 0 696 5,000 657,320,000 668,800,120 0 997 6,000 788,784,000 802,560,120 0 1,377 7,000 920,248,000 936,320,120 0 1,835 8,000 1,051,712,000 1,070,080,120 0 2,317 9,000 1,183,176,000 1,203,840,120 0 2,845 10,000 1,314,640,000 1,337,600,120 0 3,444 {noformat} Conclusions: # when CompressedOops is on, the current estimation is pretty close to the actual usage # when CompressedOops is off, RAM of basic aggregators are underestimated by 25%, due to reference size increasing from 4 to 8. Memory Hungry aggregators are not affected too much # sparse bitmap is underestimated by 4x, needs improvement there # use jamm to count heap usage of a big object graph is accurate but slow, use with caution > more RAM estimation test for MeasureAggregator and GTAggregateScanner > --------------------------------------------------------------------- > > Key: KYLIN-2083 > URL: https://issues.apache.org/jira/browse/KYLIN-2083 > Project: Kylin > Issue Type: Sub-task > Components: Tools, Build and Test > Affects Versions: v1.5.4.1 > Reporter: Dayue Gao > Assignee: Dayue Gao > Fix For: v1.6.0 > > > Current RAM estimations for MeasureAggregator and GTAggregateScanner are > based on test results from AggregationCacheMemSizeTest. I'd like to see if > there is room for improvement, and if there is, how much we can improve. > Points I'm interested in are: > # *CompressedOops ON v.s OFF*: when CompressedOops is off on large heap, each > reference takes 8 bytes. I was wondering how much it will affect the RAM of > AggregationCache. > # *Variable Length Aggregator*: does the current estimation works well on > varlen aggregator like BitmapAggregator? > # *Real Heap Usage Count via Instrumentation*: the current approach to obtain > the actual heap usage of objects looks fine, however, I was wondering if > using Java instrumentation agent will give us a more precise number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)