[jira] [Resolved] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li resolved KYLIN-3672. --- Resolution: Fixed Because kylin-defaults.properties is built in kylin-core-common-.jar, no need to getResource for it every time. Move it into method getInstanceFromEnv(). Because it only need load once. And not to impact CoProcessor logic. In CoProcessor class CubeVisitService, it will use KylinConfig as util class to generate config object from String, it's dangerous to involve any logic of load properties file with CoProcessor due to there is no Kylin.properties in its package, so it will not call KylinConfig.getInstanceFromEnv(). Patch already merged into master and 2.5.x branch. > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, > TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at >
[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683099#comment-16683099 ] Zongwei Li commented on KYLIN-3672: --- [~Shaofengshi] Thank you for help on this patch! > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, > TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} > > {{}}{{at > org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} > > {{}}{{at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} > > 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> > > {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon > prio=}}{{5}}
[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681641#comment-16681641 ] Zongwei Li commented on KYLIN-3672: --- [~Shaofengshi] Already upload new patch for this bug, fixed pervious impact to CoProcessor, integration test passed in local, please help review it.Thanks. > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, > TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} > > {{}}{{at > org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} > > {{}}{{at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} > > 43 threads waiting to lock
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: (was: KYLIN-3672.master.001.patch) > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, > TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} > > {{}}{{at > org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} > > {{}}{{at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} > > 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> > > {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}}
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: KYLIN-3672.master.002.patch > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.001.patch, > KYLIN-3672.master.002.patch, TrendChartAfterFix.png, TrendChartBeforeFix.png, > codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} > > {{}}{{at > org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} > > {{}}{{at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} > > 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> > > {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}}
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Description: Hi, Kylin Team We found one Kylin performance bug during performance tuning for our BI report integrate with Kylin. +Background+ Our BI report show customer usage report to enterprise customers, provide 15 usage charts in report page. Each chart need send API request to Kylin with different SQLs. So it means for one user, it will trigger 15 API calls(by JDBC) to Kylin. For our product scale, we need support at least 20 users to review the report at same time for each Kylin query node. So it means each Kylin node should be able to handle 15 * 20 = 300 queries per second. +Performance Report+ To reduce the network impact. We built up Kylin cluster and testing machine in the same network with Hadoop system. We use gatling and Jmeter tools to do several round testing, result as follow. |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean Response Time (ms)| |1|773|13|77| |15|3245|54|279| |25|3844|64|390| |50|4912|82|612| |75|5405|90|841| |100|5436|91|1108| |150|5434|91|1688| And draw the trend chart as follow: !TrendChartBeforeFix.png! +Conclusion+ >From the trend, when the thread count reach 75, the handled queries per second >reaches peak data 90, and cannot improved by increase the thread count. Each Kylin query engine can handle 90 queries per second, it means only support 90/15 = 6 users to review report page at same time. Even we setup 3 query nodes, can extend to 18 users at same time, this performance capacity cannot meet our business requirement. +Analyze+ >From test result, response for one thread is fast, but as the thread increase, >throughput of Kylin not increased as we expected. We have full code review for Kylin query engine, and use Jstack and JProfile to do analyze, found the root cause for this performance bottleneck. This is one regression bug introduced by new feature involved one year before. With bug fixing, one Kylin node can handle 350+ queries per second. Submit this bug for contribute patch to Kylin. +Jstack Log Analyze+ We use Jstack to capture thread info during performance testing. Already attach one of them 'jstackBeforeBugFix.log'. >From the log, we can found that One thread locked at sun.misc.URLClassPath.getNextLoader. TID is {color:#ff}*0x00048007a180*{color} {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry [}}\\\{{0x7f272e40d000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} {{}}{{at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} {{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry [}}\\\{{0x7f279f503000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at
[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680666#comment-16680666 ] Zongwei Li commented on KYLIN-3672: --- Let me check the integration test. > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Fix For: v2.6.0 > > Attachments: KYLIN-3672.master.001.patch, TrendChartAfterFix.png, > TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > +Jstack Log Analyze+ > We use Jstack to capture thread info during performance testing. Already > attach one of them 'jstackBeforeBugFix.log'. > From the log, we can found that > One thread locked at sun.misc.URLClassPath.getNextLoader. TID is > {color:#ff}*0x00048007a180*{color} > > {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} > {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry > [}}\\{{0x7f272e40d000}}{{]}} > > {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} > > {{}}{{at > sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} > > {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} > > {{}}{{at > sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} > > {{}}{{at > java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} > > {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} > > {{}}{{at > java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} > > {{}}{{at > java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} > > {{}}{{at > org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} > > {{}}{{at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} > > 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> > > {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon > prio=}}{{5}} {{os_prio=}}{{0}}
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: TrendChartAfterFix.png Description: Hi, Kylin Team We found one Kylin performance bug during performance tuning for our BI report integrate with Kylin. +Background+ Our BI report show customer usage report to enterprise customers, provide 15 usage charts in report page. Each chart need send API request to Kylin with different SQLs. So it means for one user, it will trigger 15 API calls(by JDBC) to Kylin. For our product scale, we need support at least 20 users to review the report at same time for each Kylin query node. So it means each Kylin node should be able to handle 15 * 20 = 300 queries per second. +Performance Report+ To reduce the network impact. We built up Kylin cluster and testing machine in the same network with Hadoop system. We use gatling and Jmeter tools to do several round testing, result as follow. |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean Response Time (ms)| |1|773|13|77| |15|3245|54|279| |25|3844|64|390| |50|4912|82|612| |75|5405|90|841| |100|5436|91|1108| |150|5434|91|1688| And draw the trend chart as follow: !TrendChartBeforeFix.png! +Conclusion+ >From the trend, when the thread count reach 75, the handled queries per second >reaches peak data 90, and cannot improved by increase the thread count. Each Kylin query engine can handle 90 queries per second, it means only support 90/15 = 6 users to review report page at same time. Even we setup 3 query nodes, can extend to 18 users at same time, this performance capacity cannot meet our business requirement. +Analyze+ >From test result, response for one thread is fast, but as the thread increase, >throughput of Kylin not increased as we expected. We have full code review for Kylin query engine, and use Jstack and JProfile to do analyze, found the root cause for this performance bottleneck. This is one regression bug introduced by new feature involved one year before. With bug fixing, one Kylin node can handle 350+ queries per second. Submit this bug for contribute patch to Kylin. +Jstack Log Analyze+ We use Jstack to capture thread info during performance testing. Already attach one of them 'jstackBeforeBugFix.log'. >From the log, we can found that One thread locked at sun.misc.URLClassPath.getNextLoader. TID is {color:#ff}*0x00048007a180*{color} {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry [}}\\{{0x7f272e40d000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} {{}}{{at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} {{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry [}}\\{{0x7f279f503000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Description: Hi, Kylin Team We found one Kylin performance bug during performance tuning for our BI report integrate with Kylin. +Background+ Our BI report show customer usage report to enterprise customers, provide 15 usage charts in report page. Each chart need send API request to Kylin with different SQLs. So it means for one user, it will trigger 15 API calls(by JDBC) to Kylin. For our product scale, we need support at least 20 users to review the report at same time for each Kylin query node. So it means each Kylin node should be able to handle 15 * 20 = 300 queries per second. +Performance Report+ To reduce the network impact. We built up Kylin cluster and testing machine in the same network with Hadoop system. We use gatling and Jmeter tools to do several round testing, result as follow. |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean Response Time (ms)| |1|773|13|77| |15|3245|54|279| |25|3844|64|390| |50|4912|82|612| |75|5405|90|841| |100|5436|91|1108| |150|5434|91|1688| And draw the trend chart as follow: !TrendChartBeforeFix.png! +Conclusion+ >From the trend, when the thread count reach 75, the handled queries per second >reaches peak data 90, and cannot improved by increase the thread count. Each Kylin query engine can handle 90 queries per second, it means only support 90/15 = 6 users to review report page at same time. Even we setup 3 query nodes, can extend to 18 users at same time, this performance capacity cannot meet our business requirement. +Analyze+ >From test result, response for one thread is fast, but as the thread increase, >throughput of Kylin not increased as we expected. We have full code review for Kylin query engine, and use Jstack and JProfile to do analyze, found the root cause for this performance bottleneck. This is one regression bug introduced by new feature involved one year before. With bug fixing, one Kylin node can handle 350+ queries per second. Submit this bug for contribute patch to Kylin. +Jstack Log Analyze+ We use Jstack to capture thread info during performance testing. Already attach one of them 'jstackBeforeBugFix.log'. >From the log, we can found that One thread locked at sun.misc.URLClassPath.getNextLoader. TID is {color:#FF}*0x00048007a180*{color} {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry [}}{{0x7f272e40d000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} {{}}{{at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} 43 threads waiting to lock <{color:#FF}*0x00048007a180*{color}> {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} {{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry [}}{{0x7f279f503000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: jstackBeforeBugFix.log > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png, > jstackBeforeBugFix.log > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > > +Kylin Server Info+ > |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| > |Query Engine|16 (2.4G)|128|1024| > > +Kylin Package+ > apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) > > +Query SQL+ > SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no > PreparedStatement cache, performance will be more worse). Filter will hit all > 6 segments. > > +Cube Info+ > Segment Number: 6 Total Size: 47 MB > > Segment: 2018010100_2018101100 > Start Time: 2018-01-01 00:00:00 > End Time: 2018-10-11 00:00:00 > Source Count: 351934019 > HBase Table: KYLIN_69Q9A850DZ > Region Count: 1 > Size: 47 MB > > Segment: 2018101100_2018101200 > Start Time: 2018-10-11 00:00:00 > End Time: 2018-10-12 00:00:00 > Source Count: 7085485 > HBase Table: KYLIN_ZCT39S8FUA > Region Count: 1 > Size: less than 1 MB > > > Segment: 2018101200_2018101300 > Start Time: 2018-10-12 00:00:00 > End Time: 2018-10-13 00:00:00 > Source Count: 5534968 > HBase Table: KYLIN_RKRRLA958T > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101300_2018101400 > Start Time: 2018-10-13 00:00:00 > End Time: 2018-10-14 00:00:00 > Source Count: 242856 > HBase Table: KYLIN_Q6DKCONN81 > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101400_2018101500 > Start Time: 2018-10-14 00:00:00 > End Time: 2018-10-15 00:00:00 > Source Count: 236122 > HBase Table: KYLIN_JY4WQD2MJH > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101500_2018101600 > Start Time: 2018-10-15 00:00:00 > End Time: 2018-10-16 00:00:00 > Source Count: 6172353 > HBase Table: KYLIN_E2ELLINV22 > Region Count: 1 > Size: less than 1 MB > > +HBase Region Server+ > Count: 6 > hbase.regionserver.handler.count: 120. > Not blocked in CoProcessor RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679449#comment-16679449 ] Zongwei Li commented on KYLIN-3672: --- [~Shaofengshi] Already merged code with latest code in master and generated patch file in JIRA, who can help review the file or what else needed to do. It's my first time to commit patch for Kylin. [~yimingliu] Let me add the detail analyze from code in this bug > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > > +Kylin Server Info+ > |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| > |Query Engine|16 (2.4G)|128|1024| > > +Kylin Package+ > apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) > > +Query SQL+ > SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no > PreparedStatement cache, performance will be more worse). Filter will hit all > 6 segments. > > +Cube Info+ > Segment Number: 6 Total Size: 47 MB > > Segment: 2018010100_2018101100 > Start Time: 2018-01-01 00:00:00 > End Time: 2018-10-11 00:00:00 > Source Count: 351934019 > HBase Table: KYLIN_69Q9A850DZ > Region Count: 1 > Size: 47 MB > > Segment: 2018101100_2018101200 > Start Time: 2018-10-11 00:00:00 > End Time: 2018-10-12 00:00:00 > Source Count: 7085485 > HBase Table: KYLIN_ZCT39S8FUA > Region Count: 1 > Size: less than 1 MB > > > Segment: 2018101200_2018101300 > Start Time: 2018-10-12 00:00:00 > End Time: 2018-10-13 00:00:00 > Source Count: 5534968 > HBase Table: KYLIN_RKRRLA958T > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101300_2018101400 > Start Time: 2018-10-13 00:00:00 > End Time: 2018-10-14 00:00:00 > Source Count: 242856 > HBase Table: KYLIN_Q6DKCONN81 > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101400_2018101500 > Start Time: 2018-10-14 00:00:00 > End Time: 2018-10-15 00:00:00 > Source Count: 236122 > HBase Table: KYLIN_JY4WQD2MJH > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101500_2018101600 > Start Time: 2018-10-15 00:00:00 > End Time: 2018-10-16 00:00:00 > Source Count: 6172353 > HBase Table: KYLIN_E2ELLINV22 > Region Count: 1 > Size: less than 1 MB > > +HBase Region Server+ > Count: 6 > hbase.regionserver.handler.count:
[jira] [Created] (KYLIN-3672) Performance is poor when multiple queries occur in short period
Zongwei Li created KYLIN-3672: - Summary: Performance is poor when multiple queries occur in short period Key: KYLIN-3672 URL: https://issues.apache.org/jira/browse/KYLIN-3672 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.5.0 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 Reporter: Zongwei Li Attachments: TrendChartBeforeFix.png Hi, Kylin Team We found one Kylin performance bug during performance tuning for our BI report integrate with Kylin. +Background+ Our BI report show customer usage report to enterprise customers, provide 15 usage charts in report page. Each chart need send API request to Kylin with different SQLs. So it means for one user, it will trigger 15 API calls(by JDBC) to Kylin. For our product scale, we need support at least 20 users to review the report at same time for each Kylin query node. So it means each Kylin node should be able to handle 15 * 20 = 300 queries per second. +Performance Report+ To reduce the network impact. We built up Kylin cluster and testing machine in the same network with Hadoop system. We use gatling and Jmeter tools to do several round testing, result as follow. |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean Response Time (ms)| |1|773|13|77| |15|3245|54|279| |25|3844|64|390| |50|4912|82|612| |75|5405|90|841| |100|5436|91|1108| |150|5434|91|1688| And draw the trend chart as follow: !TrendChartBeforeFix.png! +Conclusion+ >From the trend, when the thread count reach 75, the handled queries per second >reaches peak data 90, and cannot improved by increase the thread count. Each Kylin query engine can handle 90 queries per second, it means only support 90/15 = 6 users to review report page at same time. Even we setup 3 query nodes, can extend to 18 users at same time, this performance capacity cannot meet our business requirement. +Analyze+ >From test result, response for one thread is fast, but as the thread increase, >throughput of Kylin not increased as we expected. We have full code review for Kylin query engine, and use Jstack and JProfile to do analyze, found the root cause for this performance bottleneck. This is one regression bug introduced by new feature involved one year before. With bug fixing, one Kylin node can handle 350+ queries per second. Submit this bug for contribute patch to Kylin. +Kylin Server Info+ |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| |Query Engine|16 (2.4G)|128|1024| +Kylin Package+ apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) +Query SQL+ SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no PreparedStatement cache, performance will be more worse). Filter will hit all 6 segments. +Cube Info+ Segment Number: 6 Total Size: 47 MB Segment: 2018010100_2018101100 Start Time: 2018-01-01 00:00:00 End Time: 2018-10-11 00:00:00 Source Count: 351934019 HBase Table: KYLIN_69Q9A850DZ Region Count: 1 Size: 47 MB Segment: 2018101100_2018101200 Start Time: 2018-10-11 00:00:00 End Time: 2018-10-12 00:00:00 Source Count: 7085485 HBase Table: KYLIN_ZCT39S8FUA Region Count: 1 Size: less than 1 MB Segment: 2018101200_2018101300 Start Time: 2018-10-12 00:00:00 End Time: 2018-10-13 00:00:00 Source Count: 5534968 HBase Table: KYLIN_RKRRLA958T Region Count: 1 Size: less than 1 MB Segment: 2018101300_2018101400 Start Time: 2018-10-13 00:00:00 End Time: 2018-10-14 00:00:00 Source Count: 242856 HBase Table: KYLIN_Q6DKCONN81 Region Count: 1 Size: less than 1 MB Segment: 2018101400_2018101500 Start Time: 2018-10-14 00:00:00 End Time: 2018-10-15 00:00:00 Source Count: 236122 HBase Table: KYLIN_JY4WQD2MJH Region Count: 1 Size: less than 1 MB Segment: 2018101500_2018101600 Start Time: 2018-10-15 00:00:00 End Time: 2018-10-16 00:00:00 Source Count: 6172353 HBase Table: KYLIN_E2ELLINV22 Region Count: 1 Size: less than 1 MB +HBase Region Server+ Count: 6 hbase.regionserver.handler.count: 120. Not blocked in CoProcessor RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679370#comment-16679370 ] Zongwei Li commented on KYLIN-3672: --- Please assign this bug to me, will contribute a patch to fix this issue. > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Critical > Labels: patch, performance > Attachments: TrendChartBeforeFix.png > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > > +Kylin Server Info+ > |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| > |Query Engine|16 (2.4G)|128|1024| > > +Kylin Package+ > apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) > > +Query SQL+ > SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no > PreparedStatement cache, performance will be more worse). Filter will hit all > 6 segments. > > +Cube Info+ > Segment Number: 6 Total Size: 47 MB > > Segment: 2018010100_2018101100 > Start Time: 2018-01-01 00:00:00 > End Time: 2018-10-11 00:00:00 > Source Count: 351934019 > HBase Table: KYLIN_69Q9A850DZ > Region Count: 1 > Size: 47 MB > > Segment: 2018101100_2018101200 > Start Time: 2018-10-11 00:00:00 > End Time: 2018-10-12 00:00:00 > Source Count: 7085485 > HBase Table: KYLIN_ZCT39S8FUA > Region Count: 1 > Size: less than 1 MB > > > Segment: 2018101200_2018101300 > Start Time: 2018-10-12 00:00:00 > End Time: 2018-10-13 00:00:00 > Source Count: 5534968 > HBase Table: KYLIN_RKRRLA958T > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101300_2018101400 > Start Time: 2018-10-13 00:00:00 > End Time: 2018-10-14 00:00:00 > Source Count: 242856 > HBase Table: KYLIN_Q6DKCONN81 > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101400_2018101500 > Start Time: 2018-10-14 00:00:00 > End Time: 2018-10-15 00:00:00 > Source Count: 236122 > HBase Table: KYLIN_JY4WQD2MJH > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101500_2018101600 > Start Time: 2018-10-15 00:00:00 > End Time: 2018-10-16 00:00:00 > Source Count: 6172353 > HBase Table: KYLIN_E2ELLINV22 > Region Count: 1 > Size: less than 1 MB > > +HBase Region Server+ > Count: 6 > hbase.regionserver.handler.count: 120. > Not blocked in CoProcessor RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: KYLIN-3672.master.001.patch > Performance is poor when multiple queries occur in short period > --- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: Zongwei Li >Priority: Critical > Labels: patch, performance > Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > > +Kylin Server Info+ > |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| > |Query Engine|16 (2.4G)|128|1024| > > +Kylin Package+ > apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) > > +Query SQL+ > SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no > PreparedStatement cache, performance will be more worse). Filter will hit all > 6 segments. > > +Cube Info+ > Segment Number: 6 Total Size: 47 MB > > Segment: 2018010100_2018101100 > Start Time: 2018-01-01 00:00:00 > End Time: 2018-10-11 00:00:00 > Source Count: 351934019 > HBase Table: KYLIN_69Q9A850DZ > Region Count: 1 > Size: 47 MB > > Segment: 2018101100_2018101200 > Start Time: 2018-10-11 00:00:00 > End Time: 2018-10-12 00:00:00 > Source Count: 7085485 > HBase Table: KYLIN_ZCT39S8FUA > Region Count: 1 > Size: less than 1 MB > > > Segment: 2018101200_2018101300 > Start Time: 2018-10-12 00:00:00 > End Time: 2018-10-13 00:00:00 > Source Count: 5534968 > HBase Table: KYLIN_RKRRLA958T > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101300_2018101400 > Start Time: 2018-10-13 00:00:00 > End Time: 2018-10-14 00:00:00 > Source Count: 242856 > HBase Table: KYLIN_Q6DKCONN81 > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101400_2018101500 > Start Time: 2018-10-14 00:00:00 > End Time: 2018-10-15 00:00:00 > Source Count: 236122 > HBase Table: KYLIN_JY4WQD2MJH > Region Count: 1 > Size: less than 1 MB > > Segment: 2018101500_2018101600 > Start Time: 2018-10-15 00:00:00 > End Time: 2018-10-16 00:00:00 > Source Count: 6172353 > HBase Table: KYLIN_E2ELLINV22 > Region Count: 1 > Size: less than 1 MB > > +HBase Region Server+ > Count: 6 > hbase.regionserver.handler.count: 120. > Not blocked in CoProcessor RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period
[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3672: -- Attachment: codeChangedCausedThisBug.png Description: Hi, Kylin Team We found one Kylin performance bug during performance tuning for our BI report integrate with Kylin. +Background+ Our BI report show customer usage report to enterprise customers, provide 15 usage charts in report page. Each chart need send API request to Kylin with different SQLs. So it means for one user, it will trigger 15 API calls(by JDBC) to Kylin. For our product scale, we need support at least 20 users to review the report at same time for each Kylin query node. So it means each Kylin node should be able to handle 15 * 20 = 300 queries per second. +Performance Report+ To reduce the network impact. We built up Kylin cluster and testing machine in the same network with Hadoop system. We use gatling and Jmeter tools to do several round testing, result as follow. |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean Response Time (ms)| |1|773|13|77| |15|3245|54|279| |25|3844|64|390| |50|4912|82|612| |75|5405|90|841| |100|5436|91|1108| |150|5434|91|1688| And draw the trend chart as follow: !TrendChartBeforeFix.png! +Conclusion+ >From the trend, when the thread count reach 75, the handled queries per second >reaches peak data 90, and cannot improved by increase the thread count. Each Kylin query engine can handle 90 queries per second, it means only support 90/15 = 6 users to review report page at same time. Even we setup 3 query nodes, can extend to 18 users at same time, this performance capacity cannot meet our business requirement. +Analyze+ >From test result, response for one thread is fast, but as the thread increase, >throughput of Kylin not increased as we expected. We have full code review for Kylin query engine, and use Jstack and JProfile to do analyze, found the root cause for this performance bottleneck. This is one regression bug introduced by new feature involved one year before. With bug fixing, one Kylin node can handle 350+ queries per second. Submit this bug for contribute patch to Kylin. +Jstack Log Analyze+ We use Jstack to capture thread info during performance testing. Already attach one of them 'jstackBeforeBugFix.log'. >From the log, we can found that One thread locked at sun.misc.URLClassPath.getNextLoader. TID is {color:#ff}*0x00048007a180*{color} {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry [}}\{{0x7f272e40d000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}} {{}}{{at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}} {{}}{{at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}} 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} {{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry [}}\{{0x7f279f503000}}{{]}} {{ }}{{java.lang.Thread.State: BLOCKED (on object monitor)}} {{}}{{at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}} {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}} {{}}{{at sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}} {{}}{{at java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}} {{}}{{at java.security.AccessController.doPrivileged(Native Method)}} {{}}{{at java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}} {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}} {{}}{{at
[jira] [Updated] (KYLIN-3601) The max connection number generated by the PreparedContextPool is inconsistent with the configuration.
[ https://issues.apache.org/jira/browse/KYLIN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3601: -- Attachment: image.png > The max connection number generated by the PreparedContextPool is > inconsistent with the configuration. > -- > > Key: KYLIN-3601 > URL: https://issues.apache.org/jira/browse/KYLIN-3601 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.0 >Reporter: huaicui >Priority: Major > Attachments: FirstResponseDistribute.jpg, > SixthResponseDistribute.jpg, image-2018-09-28-15-14-00-288.png, image.png > > > 因为并发性能不够,使用了magang提供的PrepareStatement方法进行测试。性能有所有提高,但随着测试次数的增加,吞吐率会越来越低而且数据超时也越来越多。经过修改代码在queryAndUpdateCache最后返回前加入日志打印:logger.debug("BorrowedCount:"+preparedContextPool.getBorrowedCount() > +",DestroyedCount:"+preparedContextPool.getDestroyedCount() > +",CreatedCount:"+preparedContextPool.getCreatedCount() > +",ReturnedCount:"+preparedContextPool.getReturnedCount() > 同时配置文件加入该配置: > kylin.query.statement-cache-max-num-per-key=200 > > > 日志显示,当同一sql并发一段时间后,PreparedContextPool创建了越来越多PrepareStatement,并没有进行阻塞后续来的请求。 > !image-2018-09-28-15-14-00-288.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job
[ https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626633#comment-16626633 ] Zongwei Li commented on KYLIN-3569: --- Sure, let me try it. > Server with query mode still can submit/build job > - > > Key: KYLIN-3569 > URL: https://issues.apache.org/jira/browse/KYLIN-3569 > Project: Kylin > Issue Type: Bug > Components: Job Engine, REST Service, Security >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Major > Labels: build, documentation, security > Attachments: kylinCode.png > > > From the Docs at Kylin site, > [http://kylin.apache.org/docs24/install/kylin_cluster.html] > * *query* : run query engine only; Kylin query engine accepts and answers > your SQL queries > It seems that if server set with 'kylin.server.mode=query', it should not can > support submit/build job. But as we tested, server with query mode still can > submit/build job from UI or RESTFul API. > We analyzed the source code, found that there didn't exist any protect logic > to check whether server is at 'job' or 'build' mode in service layer for > submit/build job. Already attach the source code in this issue. > This issue really confused us, because we considered query server cannot > build job in Kylin Docs and many Kylin books. And query server will exposed > to 3rd BI tool to query the data, if we forget to configure the suitable ACL > for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job
[ https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620215#comment-16620215 ] Zongwei Li commented on KYLIN-3569: --- After code review again, there is a hardcode string in DistributedScheduler.class, line 192, if (!("job".equals(serverMode.toLowerCase()) || "all".equals(serverMode.toLowerCase( { logger.info("server mode: " + serverMode + ", no need to run job scheduler"); return; } Already clarify the responsibility which job engine take by code review, suggest refactor this code to replace the "job" with public final static String SERVER_MODE_JOB = "job"; in Contant.class which already exist > Server with query mode still can submit/build job > - > > Key: KYLIN-3569 > URL: https://issues.apache.org/jira/browse/KYLIN-3569 > Project: Kylin > Issue Type: Bug > Components: Job Engine, REST Service, Security >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Major > Labels: build, documentation, security > Attachments: kylinCode.png > > > From the Docs at Kylin site, > [http://kylin.apache.org/docs24/install/kylin_cluster.html] > * *query* : run query engine only; Kylin query engine accepts and answers > your SQL queries > It seems that if server set with 'kylin.server.mode=query', it should not can > support submit/build job. But as we tested, server with query mode still can > submit/build job from UI or RESTFul API. > We analyzed the source code, found that there didn't exist any protect logic > to check whether server is at 'job' or 'build' mode in service layer for > submit/build job. Already attach the source code in this issue. > This issue really confused us, because we considered query server cannot > build job in Kylin Docs and many Kylin books. And query server will exposed > to 3rd BI tool to query the data, if we forget to configure the suitable ACL > for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3569) Server with query mode still can submit/build job
[ https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620023#comment-16620023 ] Zongwei Li commented on KYLIN-3569: --- For ' job server is responsible to schedule jobs', what's case for the schedule jobs? Can you help to give detail information about it, we want to know which function only can Job server take. It will help us to design the deployment architecture. Thanks > Server with query mode still can submit/build job > - > > Key: KYLIN-3569 > URL: https://issues.apache.org/jira/browse/KYLIN-3569 > Project: Kylin > Issue Type: Bug > Components: Job Engine, REST Service, Security >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Major > Labels: build, documentation, security > Attachments: kylinCode.png > > > From the Docs at Kylin site, > [http://kylin.apache.org/docs24/install/kylin_cluster.html] > * *query* : run query engine only; Kylin query engine accepts and answers > your SQL queries > It seems that if server set with 'kylin.server.mode=query', it should not can > support submit/build job. But as we tested, server with query mode still can > submit/build job from UI or RESTFul API. > We analyzed the source code, found that there didn't exist any protect logic > to check whether server is at 'job' or 'build' mode in service layer for > submit/build job. Already attach the source code in this issue. > This issue really confused us, because we considered query server cannot > build job in Kylin Docs and many Kylin books. And query server will exposed > to 3rd BI tool to query the data, if we forget to configure the suitable ACL > for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3569) Server with query mode still can submit/build job
[ https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3569: -- Description: >From the Docs at Kylin site, >[http://kylin.apache.org/docs24/install/kylin_cluster.html] * *query* : run query engine only; Kylin query engine accepts and answers your SQL queries It seems that if server set with 'kylin.server.mode=query', it should not can support submit/build job. But as we tested, server with query mode still can submit/build job from UI or RESTFul API. We analyzed the source code, found that there didn't exist any protect logic to check whether server is at 'job' or 'build' mode in service layer for submit/build job. Already attach the source code in this issue. This issue really confused us, because we considered query server cannot build job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI tool to query the data, if we forget to configure the suitable ACL for Cubes, then the 3rd BI tool can trigger build job in any time. was: >From the Docs at Kylin site, >[http://kylin.apache.org/docs24/install/kylin_cluster.html] * *query* : run query engine only; Kylin query engine accepts and answers your SQL queries It seems that if server set with 'kylin.server.mode=query', it should not can support submit/build job. But as we tested, server with query mode still can submit/build job from UI or RESTFul API. We analyzed the source code, found that there didn't exist any protect logic to check whether server is at 'job' or 'build' mode in service layer for submit/build job. Will attach the source code is this issue. This issue really confused us, because we considered query server cannot build job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI tool to query the data, if we forget to configure the suitable ACL for Cubes, then the 3rd BI tool can trigger build job in any time. > Server with query mode still can submit/build job > - > > Key: KYLIN-3569 > URL: https://issues.apache.org/jira/browse/KYLIN-3569 > Project: Kylin > Issue Type: Bug > Components: Job Engine, REST Service, Security >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Major > Labels: build, documentation, security > Attachments: kylinCode.png > > > From the Docs at Kylin site, > [http://kylin.apache.org/docs24/install/kylin_cluster.html] > * *query* : run query engine only; Kylin query engine accepts and answers > your SQL queries > It seems that if server set with 'kylin.server.mode=query', it should not can > support submit/build job. But as we tested, server with query mode still can > submit/build job from UI or RESTFul API. > We analyzed the source code, found that there didn't exist any protect logic > to check whether server is at 'job' or 'build' mode in service layer for > submit/build job. Already attach the source code in this issue. > This issue really confused us, because we considered query server cannot > build job in Kylin Docs and many Kylin books. And query server will exposed > to 3rd BI tool to query the data, if we forget to configure the suitable ACL > for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3569) Server with query mode still can submit/build job
[ https://issues.apache.org/jira/browse/KYLIN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li updated KYLIN-3569: -- Attachment: kylinCode.png > Server with query mode still can submit/build job > - > > Key: KYLIN-3569 > URL: https://issues.apache.org/jira/browse/KYLIN-3569 > Project: Kylin > Issue Type: Bug > Components: Job Engine, REST Service, Security >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Priority: Major > Labels: build, documentation, security > Attachments: kylinCode.png > > > From the Docs at Kylin site, > [http://kylin.apache.org/docs24/install/kylin_cluster.html] > * *query* : run query engine only; Kylin query engine accepts and answers > your SQL queries > It seems that if server set with 'kylin.server.mode=query', it should not can > support submit/build job. But as we tested, server with query mode still can > submit/build job from UI or RESTFul API. > We analyzed the source code, found that there didn't exist any protect logic > to check whether server is at 'job' or 'build' mode in service layer for > submit/build job. Will attach the source code is this issue. > This issue really confused us, because we considered query server cannot > build job in Kylin Docs and many Kylin books. And query server will exposed > to 3rd BI tool to query the data, if we forget to configure the suitable ACL > for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3569) Server with query mode still can submit/build job
Zongwei Li created KYLIN-3569: - Summary: Server with query mode still can submit/build job Key: KYLIN-3569 URL: https://issues.apache.org/jira/browse/KYLIN-3569 Project: Kylin Issue Type: Bug Components: Job Engine, REST Service, Security Affects Versions: v2.4.1 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 Reporter: Zongwei Li >From the Docs at Kylin site, >[http://kylin.apache.org/docs24/install/kylin_cluster.html] * *query* : run query engine only; Kylin query engine accepts and answers your SQL queries It seems that if server set with 'kylin.server.mode=query', it should not can support submit/build job. But as we tested, server with query mode still can submit/build job from UI or RESTFul API. We analyzed the source code, found that there didn't exist any protect logic to check whether server is at 'job' or 'build' mode in service layer for submit/build job. Will attach the source code is this issue. This issue really confused us, because we considered query server cannot build job in Kylin Docs and many Kylin books. And query server will exposed to 3rd BI tool to query the data, if we forget to configure the suitable ACL for Cubes, then the 3rd BI tool can trigger build job in any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (KYLIN-3568) User login error message is inaccurate
[ https://issues.apache.org/jira/browse/KYLIN-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongwei Li closed KYLIN-3568. - Resolution: Duplicate Fix Version/s: (was: v2.6.0) Sorry for duplicate bug. Will update it > User login error message is inaccurate > -- > > Key: KYLIN-3568 > URL: https://issues.apache.org/jira/browse/KYLIN-3568 > Project: Kylin > Issue Type: Bug > Components: REST Service, Web >Affects Versions: v2.4.1 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 >Reporter: Zongwei Li >Assignee: XiaoXiang Yu >Priority: Minor > Labels: usability > > Hi Kylin team, > > We found one issue when log in Kylin. The error message will misunderstood > user. > > I couldn't log in Kylin even I input correct username & password (enable > LDAP). > So I checked the log on server, it showed me that there exist HBase > connection issues. > Root cause is that the HBase server which Kylin used as metadata server is > down, but the login message showed me that I should check my username or > password. It's really confusing. > Then I read some source codes about login module, and found out Kylin shares > the same error message for different cases. > > We suggested two options: > # redirect to global error page when HBase connection fail after login, > shows error message(e.g. System Error, please contact system administrator). > # enhance the error code for login logic, make the error message more > specific. > > There are login error message and the log. And I log in successfully after > recover HBase servers. > !image-2018-09-17-20-25-54-896.png!!image-2018-09-17-20-22-20-294.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3568) User login error message is inaccurate
Zongwei Li created KYLIN-3568: - Summary: User login error message is inaccurate Key: KYLIN-3568 URL: https://issues.apache.org/jira/browse/KYLIN-3568 Project: Kylin Issue Type: Bug Components: REST Service, Web Affects Versions: v2.4.1 Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 Reporter: Zongwei Li Assignee: XiaoXiang Yu Fix For: v2.6.0 Hi Kylin team, We found one issue when log in Kylin. The error message will misunderstood user. I couldn't log in Kylin even I input correct username & password (enable LDAP). So I checked the log on server, it showed me that there exist HBase connection issues. Root cause is that the HBase server which Kylin used as metadata server is down, but the login message showed me that I should check my username or password. It's really confusing. Then I read some source codes about login module, and found out Kylin shares the same error message for different cases. We suggested two options: # redirect to global error page when HBase connection fail after login, shows error message(e.g. System Error, please contact system administrator). # enhance the error code for login logic, make the error message more specific. There are login error message and the log. And I log in successfully after recover HBase servers. !image-2018-09-17-20-25-54-896.png!!image-2018-09-17-20-22-20-294.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)