[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-11 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683124#comment-16683124
 ] 

Shaofeng SHI commented on KYLIN-3672:
-

Zongwei, excellent!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-11 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683099#comment-16683099
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Thank you for help on this patch!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682849#comment-16682849
 ] 

ASF subversion and git services commented on KYLIN-3672:


Commit 023864a3454216fb9a0c4a7dc21c97098dc253c9 in kylin's branch 
refs/heads/2.5.x from zonli
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=023864a ]

KYLIN-3672 Performance is poor when multiple queries occur in short period


> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-10 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682426#comment-16682426
 ] 

ASF subversion and git services commented on KYLIN-3672:


Commit 91a1b31080767e9590ef5af3d9d9861390d9c097 in kylin's branch 
refs/heads/master from zonli
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=91a1b31 ]

KYLIN-3672 Performance is poor when multiple queries occur in short period


> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-10 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682424#comment-16682424
 ] 

Shaofeng SHI commented on KYLIN-3672:
-

The new patch looks good, and have passed the integration test, I will merge it 
to master. Thank you Zongwei!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681641#comment-16681641
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Already upload new patch for this bug, fixed pervious impact to 
CoProcessor, integration test passed in local, please help review it.Thanks.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zhong Yanghong (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680763#comment-16680763
 ] 

Zhong Yanghong commented on KYLIN-3672:
---

awesome analysis

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680666#comment-16680666
 ] 

Zongwei Li commented on KYLIN-3672:
---

Let me check the integration test.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680643#comment-16680643
 ] 

Shaofeng SHI commented on KYLIN-3672:
-

Integration test is failed:

 
{code:java}
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoClassDefFoundError: 
Could not initialize class org.apache.kylin.common.KylinConfig
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2153)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.kylin.common.KylinConfig
at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:244)
at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:)
at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7553)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1876)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1858)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
... 4 more

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoClassDefFoundError: 
Could not initialize class org.apache.kylin.common.KylinConfig
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2153)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.kylin.common.KylinConfig
at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:244)
at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:)
at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7553)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1876)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1858)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
... 4 more

{code}

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Kaige Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679588#comment-16679588
 ] 

 Kaige Liu commented on KYLIN-3672:
---

Impressive work and cool analysis!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png, 
> codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679494#comment-16679494
 ] 

Shaofeng SHI commented on KYLIN-3672:
-

hi Zongwei, I manually reviewed the change, it does make sense to avoid 
repeatedly parse the configurations. 

I will run a round of integration test to make sure it doesn't break other 
functions. I will update the status once done.

Thank you!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png, 
> jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#FF}*0x00048007a180*{color}
>  
> {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}{{0x7f272e40d000}}{{]}}
>  
> {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>  
> {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>  
> {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>  
> {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>  
> {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>  
> {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>  
> {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>  
> {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>  
> {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>  
> {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>  
> {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>  
> {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>  
> 43 threads waiting to lock <{color:#FF}*0x00048007a180*{color}> 
>  
> {{"Query 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679449#comment-16679449
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Already merged code with latest code in master and generated 
patch file in JIRA, who can help review the file or what else needed to do. 
It's my first time to commit patch for Kylin. 

[~yimingliu] Let me add the detail analyze from code in this bug

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679370#comment-16679370
 ] 

Zongwei Li commented on KYLIN-3672:
---

Please assign this bug to me, will contribute a patch to fix this issue.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679380#comment-16679380
 ] 

Shaofeng SHI commented on KYLIN-3672:
-

[~zonli] patch is welcomed. thank you!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Billy Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679387#comment-16679387
 ] 

Billy Liu commented on KYLIN-3672:
--

Impressive. [~zonli] Could you share more info about the root cause? 

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)