[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687940#comment-16687940
 ] 

Shaofeng SHI commented on KYLIN-3656:
-

Patch merged, thank you Chang!

> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687938#comment-16687938
 ] 

ASF subversion and git services commented on KYLIN-3656:


Commit 889dc503c6b538e60d180055b9b39b84d15d9ba0 in kylin's branch 
refs/heads/master from chenchang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=889dc50 ]

KYLIN-3656  Improve HLLCounter performance


> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687937#comment-16687937
 ] 

ASF GitHub Bot commented on KYLIN-3656:
---

shaofengshi closed pull request #345: KYLIN-3656  Improve HLLCounter performance
URL: https://github.com/apache/kylin/pull/345
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java 
b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java
index d34fef63ab..5c192cc7fd 100644
--- 
a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java
+++ 
b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/DenseRegister.java
@@ -35,6 +35,11 @@ public DenseRegister(int p) {
 this.register = new byte[m];
 }
 
+public void  copyFrom(DenseRegister r){
+assert m == r.m;
+System.arraycopy(r.register, 0, register, 0 , register.length);
+}
+
 public void set(int pos, byte value) {
 register[pos] = value;
 }
diff --git 
a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java 
b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java
index 80bbb2a9c1..875c7eb09e 100644
--- a/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java
+++ b/core-metadata/src/main/java/org/apache/kylin/measure/hllc/HLLCounter.java
@@ -32,6 +32,14 @@
 @SuppressWarnings("serial")
 public class HLLCounter implements Serializable, Comparable {
 
+static double[] harmonicMean;
+
+static {
+harmonicMean = new double[256];
+for (int i = 1; i < 256; i++)
+harmonicMean[i] = 1.0 / (1L << i);
+}
+
 // not final for test purpose
 static double OVERFLOW_FACTOR = 0.01;
 
@@ -57,7 +65,11 @@ public HLLCounter(int p, HashFunction hashFunc) {
 
 public HLLCounter(HLLCounter another) {
 this(another.p, another.getRegisterType(), another.hashFunc);
-merge(another);
+if(another.getRegisterType() == RegisterType.DENSE){
+
((DenseRegister)register).copyFrom((DenseRegister)another.register);
+}else {
+merge(another);
+}
 }
 
 public HLLCounter(int p, RegisterType type) {
@@ -202,6 +214,8 @@ public String toString() {
 int zeroBuckets;
 
 public HLLCSnapshot(HLLCounter hllc) {
+int[] registerNums = new int[256];
+
 p = (byte) hllc.p;
 registerSum = 0;
 zeroBuckets = 0;
@@ -215,14 +229,14 @@ public HLLCSnapshot(HLLCounter hllc) {
 dr = (DenseRegister) register;
 }
 byte[] registers = dr.getRawRegister();
-for (int i = 0; i < hllc.m; i++) {
-if (registers[i] == 0) {
-registerSum++;
-zeroBuckets++;
-} else {
-registerSum += 1.0 / (1L << registers[i]);
-}
+for (int i = 0; i < hllc.m; i ++) {
+registerNums[registers[i]] ++;
 }
+zeroBuckets = registerNums[0];
+for (int i= 1; i < 256; i ++)
+registerSum += registerNums[i] * harmonicMean[i];
+
+registerSum += zeroBuckets;
 }
 
 public long getCountEstimate() {
diff --git 
a/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java 
b/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java
index 92f2aab270..789addfb8d 100644
--- 
a/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java
+++ 
b/core-metadata/src/test/java/org/apache/kylin/measure/hllc/HLLCounterTest.java
@@ -22,6 +22,7 @@
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.ArrayList;
 import java.util.HashSet;
 import java.util.Random;
 import java.util.Set;
@@ -68,6 +69,88 @@ public void tesSparseEstimate() throws IOException {
 assertTrue(hllc.getCountEstimate() > 10 * 0.9);
 }
 
+/**
+ * evaluation getCountEstimate of HLLCounter
+ * cost time : 1341[old] -> 206[new]
+ */
+@Test
+public void countPerformanceWithLargeCardinality(){
+int cardinality = 10_000_000;
+HLLCounter hllc = generateTestCounter(2009, cardinality);
+final int testCount = 5000;
+countEstimatePerformance(hllc, cardinality, testCount);
+}
+
+/**
+ * evaluation getCountEstimate of HLLCounter
+ * cost time : 1396[old] -> 274[new]
+ */
+@Test
+public void countPerformanceSmallCardinality(){
+int cardinality = 300_000;
+HLLCounter hllc = 

[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687750#comment-16687750
 ] 

ASF GitHub Bot commented on KYLIN-3656:
---

coveralls commented on issue #345: KYLIN-3656  Improve HLLCounter performance
URL: https://github.com/apache/kylin/pull/345#issuecomment-438988366
 
 
   ## Pull Request Test Coverage Report for [Build 
3850](https://coveralls.io/builds/20119299)
   
   * **16** of **16**   **(100.0%)**  changed or added relevant lines in **2** 
files are covered.
   * No unchanged relevant lines lost coverage.
   * Overall coverage increased (+**0.02%**) to **23.643%**
   
   ---
   
   
   
   |  Totals | [![Coverage 
Status](https://coveralls.io/builds/20119299/badge)](https://coveralls.io/builds/20119299)
 |
   | :-- | --: |
   | Change from base [Build 3845](https://coveralls.io/builds/20040647): |  
0.02% |
   | Covered Lines: | 16835 |
   | Relevant Lines: | 71206 |
   
   ---
   #   - [Coveralls](https://coveralls.io)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687725#comment-16687725
 ] 

ASF GitHub Bot commented on KYLIN-3656:
---

hit-lacus commented on issue #345: KYLIN-3656  Improve HLLCounter performance
URL: https://github.com/apache/kylin/pull/345#issuecomment-438982705
 
 
   Local CI pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687719#comment-16687719
 ] 

ASF GitHub Bot commented on KYLIN-3656:
---

asfgit commented on issue #345: KYLIN-3656  Improve HLLCounter performance
URL: https://github.com/apache/kylin/pull/345#issuecomment-438981954
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687718#comment-16687718
 ] 

ASF GitHub Bot commented on KYLIN-3656:
---

hit-lacus opened a new pull request #345: KYLIN-3656  Improve HLLCounter 
performance
URL: https://github.com/apache/kylin/pull/345
 
 
   The current HLLCounter implementation has some room to improve performance, 
as we find in our product environment. Improvement related to getCountEstimate 
of HLLCounter and constructor of HLLCounter.
   
   
   - Create HLLCounter from another HLLCounter, we can copy register(using 
System.arraycopy) instead of merge. (Constructor of HLLCounter)
   - Precompute harmonic mean in the HLLCSnapshot to avoid doing this on the 
fly. (getCountEstimate of HLLCounter)
   
   UnitTest has add cost duration compare.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-05 Thread Chang chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674917#comment-16674917
 ] 

Chang chen commented on KYLIN-3656:
---

[~yimingliu] you can try the UT for the result. The following is my Test :

!image-2018-11-05-18-15-36-463.png!

Removing reflection isn't include in this patch. 

> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch, 
> image-2018-11-05-18-15-36-463.png
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-03 Thread Billy Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674040#comment-16674040
 ] 

Billy Liu commented on KYLIN-3656:
--

Hi . [~baibaichen], could you share some performance improve data from your 
environment? 

> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3656) Improve HLLCounter performance

2018-11-01 Thread Chang chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671478#comment-16671478
 ] 

Chang chen commented on KYLIN-3656:
---

To get better result, we disable compress by following settings:

  kylin.storage.hbase.endpoint-compress-result=false

> Improve HLLCounter performance
> --
>
> Key: KYLIN-3656
> URL: https://issues.apache.org/jira/browse/KYLIN-3656
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: all
>Reporter: Chang chen
>Assignee: Chang chen
>Priority: Major
> Fix For: v2.6.0
>
> Attachments: 0001-KYLIN-3656-Improve-HLLCounter-performance.patch
>
>
> The current HLLCounter implementation has some room to improve performance, 
> as we find in our product environment.
>  #  Create HLLCounter from another HLLCounter, we can copy register instead 
> of merge
>  # To compute harmonic mean in the HLLCSnapshot, we could
>  ## using table to cache all 1/2^r  without computing on the fly
>  ## remove floating addition by using integer addition in the bigger loop
>  ## remove branch, e.g. needn't checking whether registers[i] is zero or not, 
> although this is minor improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)