[jira] [Resolved] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-11 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li resolved KYLIN-3672.
---
Resolution: Fixed

Because kylin-defaults.properties is built in kylin-core-common-.jar, no 
need to getResource for it every time.

Move it into method getInstanceFromEnv(). Because it only need load once.

And not to impact CoProcessor logic. In CoProcessor class CubeVisitService, it 
will use KylinConfig as util class to generate config object from String, it's 
dangerous to involve any logic of load properties file with CoProcessor due to 
there is no Kylin.properties in its package, so it will not call  
KylinConfig.getInstanceFromEnv().

Patch already merged into master and 2.5.x branch.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-11 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683099#comment-16683099
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Thank you for help on this patch!

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681641#comment-16681641
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Already upload new patch for this bug, fixed pervious impact to 
CoProcessor, integration test passed in local, please help review it.Thanks.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: (was: KYLIN-3672.master.001.patch)

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: KYLIN-3672.master.002.patch

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, 
> KYLIN-3672.master.002.patch, TrendChartAfterFix.png, TrendChartBeforeFix.png, 
> codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Description: 
Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
 (ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

+Jstack Log Analyze+

We use Jstack to capture thread info during performance testing. Already attach 
one of them 'jstackBeforeBugFix.log'.

>From the log, we can found that 

One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
{color:#ff}*0x00048007a180*{color}

 
 {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
{{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
[}}\\\{{0x7f272e40d000}}{{]}}
  
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
  
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
  
 {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
  
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
  
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
  
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
  
 {{}}{{at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
  
 {{}}{{at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
  
 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
  
 {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} 
{{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry 
[}}\\\{{0x7f279f503000}}{{]}}
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
 {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a 
sun.misc.URLClassPath)}}
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
 {{}}{{at 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680666#comment-16680666
 ] 

Zongwei Li commented on KYLIN-3672:
---

Let me check the integration test.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
 Attachment: TrendChartAfterFix.png
Description: 
Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
 (ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

+Jstack Log Analyze+

We use Jstack to capture thread info during performance testing. Already attach 
one of them 'jstackBeforeBugFix.log'.

>From the log, we can found that 

One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
{color:#ff}*0x00048007a180*{color}

 
 {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
{{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
[}}\\{{0x7f272e40d000}}{{]}}
  
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
  
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
  
 {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
  
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
  
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
  
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
  
 {{}}{{at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
  
 {{}}{{at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
  
 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
  
 {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} 
{{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry 
[}}\\{{0x7f279f503000}}{{]}}
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
 {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a 
sun.misc.URLClassPath)}}
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Description: 
Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
 (ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

+Jstack Log Analyze+

We use Jstack to capture thread info during performance testing. Already attach 
one of them 'jstackBeforeBugFix.log'.

>From the log, we can found that 

One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
{color:#FF}*0x00048007a180*{color}

 
{{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
{{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
[}}{{0x7f272e40d000}}{{]}}
 
{{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
 
{{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
 
{{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
 
{{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
 
{{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
 
{{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
 
{{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
 
{{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
 
{{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
 
{{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
 
{{}}{{at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
 
{{}}{{at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
 
43 threads waiting to lock <{color:#FF}*0x00048007a180*{color}> 
 
{{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} 
{{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry 
[}}{{0x7f279f503000}}{{]}}
{{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
{{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
{{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a 
sun.misc.URLClassPath)}}
{{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
{{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
{{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
{{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
{{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
{{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
{{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
{{}}{{at 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: jstackBeforeBugFix.log

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png, 
> jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679449#comment-16679449
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Already merged code with latest code in master and generated 
patch file in JIRA, who can help review the file or what else needed to do. 
It's my first time to commit patch for Kylin. 

[~yimingliu] Let me add the detail analyze from code in this bug

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 

[jira] [Created] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)
Zongwei Li created KYLIN-3672:
-

 Summary: Performance is poor when multiple queries occur in short 
period
 Key: KYLIN-3672
 URL: https://issues.apache.org/jira/browse/KYLIN-3672
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.5.0
 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
Reporter: Zongwei Li
 Attachments: TrendChartBeforeFix.png

Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
(ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

 

+Kylin Server Info+
|*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
|Query Engine|16 (2.4G)|128|1024|

 

+Kylin Package+

apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)

 

+Query SQL+

SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
PreparedStatement cache, performance will be more worse). Filter will hit all 6 
segments.

 

+Cube Info+

Segment Number: 6 Total Size: 47 MB

 

Segment: 2018010100_2018101100

Start Time: 2018-01-01 00:00:00

End Time: 2018-10-11 00:00:00

Source Count: 351934019

HBase Table: KYLIN_69Q9A850DZ

Region Count: 1

Size: 47 MB

 

Segment: 2018101100_2018101200

Start Time: 2018-10-11 00:00:00

End Time: 2018-10-12 00:00:00

Source Count: 7085485

HBase Table: KYLIN_ZCT39S8FUA

Region Count: 1

Size: less than 1 MB

 

 

Segment: 2018101200_2018101300

Start Time: 2018-10-12 00:00:00

End Time: 2018-10-13 00:00:00

Source Count: 5534968

HBase Table: KYLIN_RKRRLA958T

Region Count: 1

Size: less than 1 MB

 

Segment: 2018101300_2018101400

Start Time: 2018-10-13 00:00:00

End Time: 2018-10-14 00:00:00

Source Count: 242856

HBase Table: KYLIN_Q6DKCONN81

Region Count: 1

Size: less than 1 MB

 

Segment: 2018101400_2018101500

Start Time: 2018-10-14 00:00:00

End Time: 2018-10-15 00:00:00

Source Count: 236122

HBase Table: KYLIN_JY4WQD2MJH

Region Count: 1

Size: less than 1 MB

 

Segment: 2018101500_2018101600

Start Time: 2018-10-15 00:00:00

End Time: 2018-10-16 00:00:00

Source Count: 6172353

HBase Table: KYLIN_E2ELLINV22

Region Count: 1

Size: less than 1 MB

 

+HBase Region Server+ 

Count: 6

hbase.regionserver.handler.count: 120.

Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679370#comment-16679370
 ] 

Zongwei Li commented on KYLIN-3672:
---

Please assign this bug to me, will contribute a patch to fix this issue.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: KYLIN-3672.master.001.patch

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 2018010100_2018101100
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 2018101100_2018101200
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 2018101200_2018101300
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101300_2018101400
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101400_2018101500
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 2018101500_2018101600
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-08 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
 Attachment: codeChangedCausedThisBug.png
Description: 
Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
 (ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

+Jstack Log Analyze+

We use Jstack to capture thread info during performance testing. Already attach 
one of them 'jstackBeforeBugFix.log'.

>From the log, we can found that 

One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
{color:#ff}*0x00048007a180*{color}

 
 {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
{{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
[}}\{{0x7f272e40d000}}{{]}}
  
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
  
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
  
 {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
  
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
  
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
  
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
  
 {{}}{{at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
  
 {{}}{{at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
  
 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
  
 {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} 
{{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry 
[}}\{{0x7f279f503000}}{{]}}
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
 {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a 
sun.misc.URLClassPath)}}
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
 {{}}{{at 

[jira] [Updated] (KYLIN-3601) The max connection number generated by the PreparedContextPool is inconsistent with the configuration.

2018-09-28 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3601:
--
Attachment: image.png

> The max connection number generated by the PreparedContextPool is 
> inconsistent with the configuration.
> --
>
> Key: KYLIN-3601
> URL: https://issues.apache.org/jira/browse/KYLIN-3601
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
>Reporter: huaicui
>Priority: Major
> Attachments: FirstResponseDistribute.jpg, 
> SixthResponseDistribute.jpg, image-2018-09-28-15-14-00-288.png, image.png
>
>
> 因为并发性能不够,使用了magang提供的PrepareStatement方法进行测试。性能有所有提高,但随着测试次数的增加,吞吐率会越来越低而且数据超时也越来越多。经过修改代码在queryAndUpdateCache最后返回前加入日志打印:logger.debug("BorrowedCount:"+preparedContextPool.getBorrowedCount()
>  +",DestroyedCount:"+preparedContextPool.getDestroyedCount()
>  +",CreatedCount:"+preparedContextPool.getCreatedCount()
>  +",ReturnedCount:"+preparedContextPool.getReturnedCount()
> 同时配置文件加入该配置:
> kylin.query.statement-cache-max-num-per-key=200
>  
>  
> 日志显示,当同一sql并发一段时间后,PreparedContextPool创建了越来越多PrepareStatement,并没有进行阻塞后续来的请求。
> !image-2018-09-28-15-14-00-288.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-24 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626633#comment-16626633
 ] 

Zongwei Li commented on KYLIN-3569:
---

Sure, let me try it.

> Server with query mode still can submit/build job
> -
>
> Key: KYLIN-3569
> URL: https://issues.apache.org/jira/browse/KYLIN-3569
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, REST Service, Security
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Major
>  Labels: build, documentation, security
> Attachments: kylinCode.png
>
>
> From the Docs at Kylin site, 
> [http://kylin.apache.org/docs24/install/kylin_cluster.html]
>  * *query* : run query engine only; Kylin query engine accepts and answers 
> your SQL queries
> It seems that if server set with 'kylin.server.mode=query', it should not can 
> support submit/build job. But as we tested, server with query mode still can 
> submit/build job from UI or RESTFul API. 
> We analyzed the source code, found that there didn't exist any protect logic 
> to check whether server is at 'job' or 'build' mode in service layer for 
> submit/build job. Already attach the source code in this issue.
> This issue really confused us, because we considered query server cannot 
> build job in Kylin Docs and many Kylin books. And query server will exposed 
> to 3rd BI tool to query the data, if we forget to configure the suitable ACL 
> for Cubes, then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-19 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620215#comment-16620215
 ] 

Zongwei Li commented on KYLIN-3569:
---

After code review again, there is a hardcode string in 
DistributedScheduler.class, line 192, 

if (!("job".equals(serverMode.toLowerCase()) || 
"all".equals(serverMode.toLowerCase( {
   logger.info("server mode: " + serverMode + ", no need to run job scheduler");
   return;
}

Already clarify the responsibility which job engine take by code review, 
suggest refactor this code to replace the  "job" with 

public final static String SERVER_MODE_JOB = "job"; in Contant.class which 
already exist

> Server with query mode still can submit/build job
> -
>
> Key: KYLIN-3569
> URL: https://issues.apache.org/jira/browse/KYLIN-3569
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, REST Service, Security
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Major
>  Labels: build, documentation, security
> Attachments: kylinCode.png
>
>
> From the Docs at Kylin site, 
> [http://kylin.apache.org/docs24/install/kylin_cluster.html]
>  * *query* : run query engine only; Kylin query engine accepts and answers 
> your SQL queries
> It seems that if server set with 'kylin.server.mode=query', it should not can 
> support submit/build job. But as we tested, server with query mode still can 
> submit/build job from UI or RESTFul API. 
> We analyzed the source code, found that there didn't exist any protect logic 
> to check whether server is at 'job' or 'build' mode in service layer for 
> submit/build job. Already attach the source code in this issue.
> This issue really confused us, because we considered query server cannot 
> build job in Kylin Docs and many Kylin books. And query server will exposed 
> to 3rd BI tool to query the data, if we forget to configure the suitable ACL 
> for Cubes, then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-18 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620023#comment-16620023
 ] 

Zongwei Li commented on KYLIN-3569:
---

For ' job server is responsible to schedule jobs', what's case for the schedule 
jobs? Can you help to give detail information about it, we want to know which 
function only can Job server take. It will help us to design the deployment 
architecture. Thanks

> Server with query mode still can submit/build job
> -
>
> Key: KYLIN-3569
> URL: https://issues.apache.org/jira/browse/KYLIN-3569
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, REST Service, Security
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Major
>  Labels: build, documentation, security
> Attachments: kylinCode.png
>
>
> From the Docs at Kylin site, 
> [http://kylin.apache.org/docs24/install/kylin_cluster.html]
>  * *query* : run query engine only; Kylin query engine accepts and answers 
> your SQL queries
> It seems that if server set with 'kylin.server.mode=query', it should not can 
> support submit/build job. But as we tested, server with query mode still can 
> submit/build job from UI or RESTFul API. 
> We analyzed the source code, found that there didn't exist any protect logic 
> to check whether server is at 'job' or 'build' mode in service layer for 
> submit/build job. Already attach the source code in this issue.
> This issue really confused us, because we considered query server cannot 
> build job in Kylin Docs and many Kylin books. And query server will exposed 
> to 3rd BI tool to query the data, if we forget to configure the suitable ACL 
> for Cubes, then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-18 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3569:
--
Description: 
>From the Docs at Kylin site, 
>[http://kylin.apache.org/docs24/install/kylin_cluster.html]
 * *query* : run query engine only; Kylin query engine accepts and answers your 
SQL queries

It seems that if server set with 'kylin.server.mode=query', it should not can 
support submit/build job. But as we tested, server with query mode still can 
submit/build job from UI or RESTFul API. 

We analyzed the source code, found that there didn't exist any protect logic to 
check whether server is at 'job' or 'build' mode in service layer for 
submit/build job. Already attach the source code in this issue.

This issue really confused us, because we considered query server cannot build 
job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI 
tool to query the data, if we forget to configure the suitable ACL for Cubes, 
then the 3rd BI tool can trigger build job in any time.

  was:
>From the Docs at Kylin site, 
>[http://kylin.apache.org/docs24/install/kylin_cluster.html]
 * *query* : run query engine only; Kylin query engine accepts and answers your 
SQL queries

It seems that if server set with 'kylin.server.mode=query', it should not can 
support submit/build job. But as we tested, server with query mode still can 
submit/build job from UI or RESTFul API. 

We analyzed the source code, found that there didn't exist any protect logic to 
check whether server is at 'job' or 'build' mode in service layer for 
submit/build job. Will attach the source code is this issue.

This issue really confused us, because we considered query server cannot build 
job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI 
tool to query the data, if we forget to configure the suitable ACL for Cubes, 
then the 3rd BI tool can trigger build job in any time.


> Server with query mode still can submit/build job
> -
>
> Key: KYLIN-3569
> URL: https://issues.apache.org/jira/browse/KYLIN-3569
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, REST Service, Security
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Major
>  Labels: build, documentation, security
> Attachments: kylinCode.png
>
>
> From the Docs at Kylin site, 
> [http://kylin.apache.org/docs24/install/kylin_cluster.html]
>  * *query* : run query engine only; Kylin query engine accepts and answers 
> your SQL queries
> It seems that if server set with 'kylin.server.mode=query', it should not can 
> support submit/build job. But as we tested, server with query mode still can 
> submit/build job from UI or RESTFul API. 
> We analyzed the source code, found that there didn't exist any protect logic 
> to check whether server is at 'job' or 'build' mode in service layer for 
> submit/build job. Already attach the source code in this issue.
> This issue really confused us, because we considered query server cannot 
> build job in Kylin Docs and many Kylin books. And query server will exposed 
> to 3rd BI tool to query the data, if we forget to configure the suitable ACL 
> for Cubes, then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-18 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3569:
--
Attachment: kylinCode.png

> Server with query mode still can submit/build job
> -
>
> Key: KYLIN-3569
> URL: https://issues.apache.org/jira/browse/KYLIN-3569
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, REST Service, Security
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Priority: Major
>  Labels: build, documentation, security
> Attachments: kylinCode.png
>
>
> From the Docs at Kylin site, 
> [http://kylin.apache.org/docs24/install/kylin_cluster.html]
>  * *query* : run query engine only; Kylin query engine accepts and answers 
> your SQL queries
> It seems that if server set with 'kylin.server.mode=query', it should not can 
> support submit/build job. But as we tested, server with query mode still can 
> submit/build job from UI or RESTFul API. 
> We analyzed the source code, found that there didn't exist any protect logic 
> to check whether server is at 'job' or 'build' mode in service layer for 
> submit/build job. Will attach the source code is this issue.
> This issue really confused us, because we considered query server cannot 
> build job in Kylin Docs and many Kylin books. And query server will exposed 
> to 3rd BI tool to query the data, if we forget to configure the suitable ACL 
> for Cubes, then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3569) Server with query mode still can submit/build job

2018-09-18 Thread Zongwei Li (JIRA)
Zongwei Li created KYLIN-3569:
-

 Summary: Server with query mode still can submit/build job
 Key: KYLIN-3569
 URL: https://issues.apache.org/jira/browse/KYLIN-3569
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine, REST Service, Security
Affects Versions: v2.4.1
 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
Reporter: Zongwei Li


>From the Docs at Kylin site, 
>[http://kylin.apache.org/docs24/install/kylin_cluster.html]
 * *query* : run query engine only; Kylin query engine accepts and answers your 
SQL queries

It seems that if server set with 'kylin.server.mode=query', it should not can 
support submit/build job. But as we tested, server with query mode still can 
submit/build job from UI or RESTFul API. 

We analyzed the source code, found that there didn't exist any protect logic to 
check whether server is at 'job' or 'build' mode in service layer for 
submit/build job. Will attach the source code is this issue.

This issue really confused us, because we considered query server cannot build 
job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI 
tool to query the data, if we forget to configure the suitable ACL for Cubes, 
then the 3rd BI tool can trigger build job in any time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-3568) User login error message is inaccurate

2018-09-18 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li closed KYLIN-3568.
-
   Resolution: Duplicate
Fix Version/s: (was: v2.6.0)

Sorry for duplicate bug. Will update it

> User login error message is inaccurate
> --
>
> Key: KYLIN-3568
> URL: https://issues.apache.org/jira/browse/KYLIN-3568
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service, Web 
>Affects Versions: v2.4.1
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: XiaoXiang Yu
>Priority: Minor
>  Labels: usability
>
> Hi Kylin team,
>  
> We found one issue when log in Kylin. The error message will misunderstood 
> user.
>  
> I couldn't log in Kylin even I input correct username & password (enable 
> LDAP).
> So I checked the log on server, it showed me that there exist HBase 
> connection issues.
> Root cause is that the HBase server which Kylin used as metadata server is 
> down, but the login message showed me that I should check my username or 
> password. It's really confusing.
> Then I read some source codes about login module, and found out Kylin shares 
> the same error message for different cases.
>  
> We suggested two options:
>  # redirect to global error page when HBase connection fail after login, 
> shows error message(e.g. System Error, please contact system administrator).
>  # enhance the error code for login logic, make the error message more 
> specific.
>  
> There are login error message and the log. And I log in successfully after 
> recover HBase servers.
> !image-2018-09-17-20-25-54-896.png!!image-2018-09-17-20-22-20-294.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3568) User login error message is inaccurate

2018-09-18 Thread Zongwei Li (JIRA)
Zongwei Li created KYLIN-3568:
-

 Summary: User login error message is inaccurate
 Key: KYLIN-3568
 URL: https://issues.apache.org/jira/browse/KYLIN-3568
 Project: Kylin
  Issue Type: Bug
  Components: REST Service, Web 
Affects Versions: v2.4.1
 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
Reporter: Zongwei Li
Assignee: XiaoXiang Yu
 Fix For: v2.6.0


Hi Kylin team,

 

We found one issue when log in Kylin. The error message will misunderstood user.

 

I couldn't log in Kylin even I input correct username & password (enable LDAP).

So I checked the log on server, it showed me that there exist HBase connection 
issues.

Root cause is that the HBase server which Kylin used as metadata server is 
down, but the login message showed me that I should check my username or 
password. It's really confusing.

Then I read some source codes about login module, and found out Kylin shares 
the same error message for different cases.

 

We suggested two options:
 # redirect to global error page when HBase connection fail after login, shows 
error message(e.g. System Error, please contact system administrator).
 # enhance the error code for login logic, make the error message more specific.

 

There are login error message and the log. And I log in successfully after 
recover HBase servers.

!image-2018-09-17-20-25-54-896.png!!image-2018-09-17-20-22-20-294.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)