[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687940#comment-16687940 ] Shaofeng SHI commented on KYLIN-3656: - Patch merged, thank you Chang! > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687938#comment-16687938 ] ASF subversion and git services commented on KYLIN-3656: Commit 889dc503c6b538e60d180055b9b39b84d15d9ba0 in kylin's branch refs/heads/master from chenchang [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=889dc50 ] KYLIN-3656 Improve HLLCounter performance > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687937#comment-16687937 ] ASF GitHub Bot commented on KYLIN-3656: --- shaofengshi closed pull request #345: KYLIN-3656 Improve HLLCounter performance URL: https://github.com/apache/kylin/pull/345 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java index d34fef63ab..5c192cc7fd 100644 --- a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java +++ b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java @@ -35,6 +35,11 @@ public DenseRegister(int p) { this.register = new byte[m]; } +public void copyFrom(DenseRegister r){ +assert m == r.m; +System.arraycopy(r.register, 0, register, 0 , register.length); +} + public void set(int pos, byte value) { register[pos] = value; } diff --git a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java index 80bbb2a9c1..875c7eb09e 100644 --- a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java +++ b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java @@ -32,6 +32,14 @@ @SuppressWarnings("serial") public class HLLCounter implements Serializable, Comparable { +static double[] harmonicMean; + +static { +harmonicMean = new double[256]; +for (int i = 1; i < 256; i++) +harmonicMean[i] = 1.0 / (1L << i); +} + // not final for test purpose static double OVERFLOW_FACTOR = 0.01; @@ -57,7 +65,11 @@ public HLLCounter(int p, HashFunction hashFunc) { public HLLCounter(HLLCounter another) { this(another.p, another.getRegisterType(), another.hashFunc); -merge(another); +if(another.getRegisterType() == RegisterType.DENSE){ + ((DenseRegister)register).copyFrom((DenseRegister)another.register); +}else { +merge(another); +} } public HLLCounter(int p, RegisterType type) { @@ -202,6 +214,8 @@ public String toString() { int zeroBuckets; public HLLCSnapshot(HLLCounter hllc) { +int[] registerNums = new int[256]; + p = (byte) hllc.p; registerSum = 0; zeroBuckets = 0; @@ -215,14 +229,14 @@ public HLLCSnapshot(HLLCounter hllc) { dr = (DenseRegister) register; } byte[] registers = dr.getRawRegister(); -for (int i = 0; i < hllc.m; i++) { -if (registers[i] == 0) { -registerSum++; -zeroBuckets++; -} else { -registerSum += 1.0 / (1L << registers[i]); -} +for (int i = 0; i < hllc.m; i ++) { +registerNums[registers[i]] ++; } +zeroBuckets = registerNums[0]; +for (int i= 1; i < 256; i ++) +registerSum += registerNums[i] * harmonicMean[i]; + +registerSum += zeroBuckets; } public long getCountEstimate() { diff --git a/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java b/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java index 92f2aab270..789addfb8d 100644 --- a/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java +++ b/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java @@ -22,6 +22,7 @@ import java.io.IOException; import java.nio.ByteBuffer; +import java.util.ArrayList; import java.util.HashSet; import java.util.Random; import java.util.Set; @@ -68,6 +69,88 @@ public void tesSparseEstimate() throws IOException { assertTrue(hllc.getCountEstimate() > 10 * 0.9); } +/** + * evaluation getCountEstimate of HLLCounter + * cost time : 1341[old] -> 206[new] + */ +@Test +public void countPerformanceWithLargeCardinality(){ +int cardinality = 10_000_000; +HLLCounter hllc = generateTestCounter(2009, cardinality); +final int testCount = 5000; +countEstimatePerformance(hllc, cardinality, testCount); +} + +/** + * evaluation getCountEstimate of HLLCounter + * cost time : 1396[old] -> 274[new] + */ +@Test +public void countPerformanceSmallCardinality(){ +int cardinality = 300_000; +HLLCounter hllc =
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687750#comment-16687750 ] ASF GitHub Bot commented on KYLIN-3656: --- coveralls commented on issue #345: KYLIN-3656 Improve HLLCounter performance URL: https://github.com/apache/kylin/pull/345#issuecomment-438988366 ## Pull Request Test Coverage Report for [Build 3850](https://coveralls.io/builds/20119299) * **16** of **16** **(100.0%)** changed or added relevant lines in **2** files are covered. * No unchanged relevant lines lost coverage. * Overall coverage increased (+**0.02%**) to **23.643%** --- | Totals | [![Coverage Status](https://coveralls.io/builds/20119299/badge)](https://coveralls.io/builds/20119299) | | :-- | --: | | Change from base [Build 3845](https://coveralls.io/builds/20040647): | 0.02% | | Covered Lines: | 16835 | | Relevant Lines: | 71206 | --- # - [Coveralls](https://coveralls.io) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687725#comment-16687725 ] ASF GitHub Bot commented on KYLIN-3656: --- hit-lacus commented on issue #345: KYLIN-3656 Improve HLLCounter performance URL: https://github.com/apache/kylin/pull/345#issuecomment-438982705 Local CI pass. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687719#comment-16687719 ] ASF GitHub Bot commented on KYLIN-3656: --- asfgit commented on issue #345: KYLIN-3656 Improve HLLCounter performance URL: https://github.com/apache/kylin/pull/345#issuecomment-438981954 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687718#comment-16687718 ] ASF GitHub Bot commented on KYLIN-3656: --- hit-lacus opened a new pull request #345: KYLIN-3656 Improve HLLCounter performance URL: https://github.com/apache/kylin/pull/345 The current HLLCounter implementation has some room to improve performance, as we find in our product environment. Improvement related to getCountEstimate of HLLCounter and constructor of HLLCounter. - Create HLLCounter from another HLLCounter, we can copy register(using System.arraycopy) instead of merge. (Constructor of HLLCounter) - Precompute harmonic mean in the HLLCSnapshot to avoid doing this on the fly. (getCountEstimate of HLLCounter) UnitTest has add cost duration compare. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674917#comment-16674917 ] Chang chen commented on KYLIN-3656: --- [~yimingliu] you can try the UT for the result. The following is my Test : !image-2018-11-05-18-15-36-463.png! Removing reflection isn't include in this patch. > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, > image-2018-11-05-18-15-36-463.png > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674040#comment-16674040 ] Billy Liu commented on KYLIN-3656: -- Hi . [~baibaichen], could you share some performance improve data from your environment? > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance
[ https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671478#comment-16671478 ] Chang chen commented on KYLIN-3656: --- To get better result, we disable compress by following settings: kylin.storage.hbase.endpoint-compress-result=false > Improve HLLCounter performance > -- > > Key: KYLIN-3656 > URL: https://issues.apache.org/jira/browse/KYLIN-3656 > Project: Kylin > Issue Type: Improvement >Affects Versions: all >Reporter: Chang chen >Assignee: Chang chen >Priority: Major > Fix For: v2.6.0 > > Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch > > > The current HLLCounter implementation has some room to improve performance, > as we find in our product environment. > # Create HLLCounter from another HLLCounter, we can copy register instead > of merge > # To compute harmonic mean in the HLLCSnapshot, we could > ## using table to cache all 1/2^r without computing on the fly > ## remove floating addition by using integer addition in the bigger loop > ## remove branch, e.g. needn't checking whether registers[i] is zero or not, > although this is minor improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)