[jira] [Commented] (KYLIN-3678) CacheStateChecker may remove a cache file that under building

2018-11-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682143#comment-16682143
 ] 

ASF subversion and git services commented on KYLIN-3678:


Commit 8e1fd97321d8c64adb328511bbd53e6d529571ab in kylin's branch 
refs/heads/master from shaofengshi
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=8e1fd97 ]

KYLIN-3678 CacheStateChecker may remove a cache file that under building


> CacheStateChecker may remove a cache file that under building
> -
>
> Key: KYLIN-3678
> URL: https://issues.apache.org/jira/browse/KYLIN-3678
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.0, v2.4.1, v2.5.0, v2.5.1
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Kylin test failed with such error:
> {code:java}
> 2018-11-09 02:15:24,379 DEBUG [main] cachesync.CachedCrudAssist:127 : Loaded 
> 1 ExternalFilterDesc(s) out of 1 resource
> 2018-11-09 02:15:24,380 WARN  [main] common.KylinConfigBase:77 : KYLIN_HOME 
> was not set
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:66 : create 
> new rocksdb 
> folder:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
>  for table cache:DEFAULT.TEST_COUNTRY
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:69 : start to 
> build lookup table:DEFAULT.TEST_COUNTRY to rocks 
> db:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
> 2018-11-09 02:15:26,814 WARN  [lookup-cache-state-checker-1] 
> common.KylinConfigBase:77 : KYLIN_HOME was not set
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:334 : check snapshot local cache state, local 
> path:lookup_cache/rocksdb
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:361 : removed cache 
> file:/var/jenkins/workspace/kylin-manual-ci/core-dictionary/lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90,
>  it is not referred by any cube
> 2018-11-09 02:15:28,474 ERROR [main] cache.RocksDBLookupBuilder:77 : error 
> when put data to rocksDB
> org.rocksdb.RocksDBException: While open a file for random read: 
> lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db/18.sst:
>  No such file or directory
>   at org.rocksdb.RocksDB.put(Native Method)
>   at org.rocksdb.RocksDB.put(RocksDB.java:453)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupBuilder.build(RocksDBLookupBuilder.java:74)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupTableCacheTest.testRestoreCacheFromFiles(RocksDBLookupTableCacheTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> 

[jira] [Assigned] (KYLIN-3678) CacheStateChecker may remove a cache file that under building

2018-11-09 Thread Shaofeng SHI (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3678:
---

Assignee: Shaofeng SHI

> CacheStateChecker may remove a cache file that under building
> -
>
> Key: KYLIN-3678
> URL: https://issues.apache.org/jira/browse/KYLIN-3678
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.0, v2.4.1, v2.5.0, v2.5.1
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Kylin test failed with such error:
> {code:java}
> 2018-11-09 02:15:24,379 DEBUG [main] cachesync.CachedCrudAssist:127 : Loaded 
> 1 ExternalFilterDesc(s) out of 1 resource
> 2018-11-09 02:15:24,380 WARN  [main] common.KylinConfigBase:77 : KYLIN_HOME 
> was not set
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:66 : create 
> new rocksdb 
> folder:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
>  for table cache:DEFAULT.TEST_COUNTRY
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:69 : start to 
> build lookup table:DEFAULT.TEST_COUNTRY to rocks 
> db:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
> 2018-11-09 02:15:26,814 WARN  [lookup-cache-state-checker-1] 
> common.KylinConfigBase:77 : KYLIN_HOME was not set
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:334 : check snapshot local cache state, local 
> path:lookup_cache/rocksdb
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:361 : removed cache 
> file:/var/jenkins/workspace/kylin-manual-ci/core-dictionary/lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90,
>  it is not referred by any cube
> 2018-11-09 02:15:28,474 ERROR [main] cache.RocksDBLookupBuilder:77 : error 
> when put data to rocksDB
> org.rocksdb.RocksDBException: While open a file for random read: 
> lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db/18.sst:
>  No such file or directory
>   at org.rocksdb.RocksDB.put(Native Method)
>   at org.rocksdb.RocksDB.put(RocksDB.java:453)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupBuilder.build(RocksDBLookupBuilder.java:74)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupTableCacheTest.testRestoreCacheFromFiles(RocksDBLookupTableCacheTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
> 

[jira] [Updated] (KYLIN-3678) CacheStateChecker may remove a cache file that under building

2018-11-09 Thread Shaofeng SHI (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3678:

Fix Version/s: v2.6.0

> CacheStateChecker may remove a cache file that under building
> -
>
> Key: KYLIN-3678
> URL: https://issues.apache.org/jira/browse/KYLIN-3678
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.0, v2.4.1, v2.5.0, v2.5.1
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Kylin test failed with such error:
> {code:java}
> 2018-11-09 02:15:24,379 DEBUG [main] cachesync.CachedCrudAssist:127 : Loaded 
> 1 ExternalFilterDesc(s) out of 1 resource
> 2018-11-09 02:15:24,380 WARN  [main] common.KylinConfigBase:77 : KYLIN_HOME 
> was not set
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:66 : create 
> new rocksdb 
> folder:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
>  for table cache:DEFAULT.TEST_COUNTRY
> 2018-11-09 02:15:24,380 INFO  [main] cache.RocksDBLookupBuilder:69 : start to 
> build lookup table:DEFAULT.TEST_COUNTRY to rocks 
> db:lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db
> 2018-11-09 02:15:26,814 WARN  [lookup-cache-state-checker-1] 
> common.KylinConfigBase:77 : KYLIN_HOME was not set
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:334 : check snapshot local cache state, local 
> path:lookup_cache/rocksdb
> 2018-11-09 02:15:26,814 INFO  [lookup-cache-state-checker-1] 
> cache.RocksDBLookupTableCache:361 : removed cache 
> file:/var/jenkins/workspace/kylin-manual-ci/core-dictionary/lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90,
>  it is not referred by any cube
> 2018-11-09 02:15:28,474 ERROR [main] cache.RocksDBLookupBuilder:77 : error 
> when put data to rocksDB
> org.rocksdb.RocksDBException: While open a file for random read: 
> lookup_cache/rocksdb/DEFAULT.TEST_COUNTRY/f19bc17c-d41d-a4be-b561-f6bd275f4c90/db/18.sst:
>  No such file or directory
>   at org.rocksdb.RocksDB.put(Native Method)
>   at org.rocksdb.RocksDB.put(RocksDB.java:453)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupBuilder.build(RocksDBLookupBuilder.java:74)
>   at 
> org.apache.kylin.dict.lookup.cache.RocksDBLookupTableCacheTest.testRestoreCacheFromFiles(RocksDBLookupTableCacheTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> 

[jira] [Commented] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681641#comment-16681641
 ] 

Zongwei Li commented on KYLIN-3672:
---

[~Shaofengshi] Already upload new patch for this bug, fixed pervious impact to 
CoProcessor, integration test passed in local, please help review it.Thanks.

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: (was: KYLIN-3672.master.001.patch)

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.002.patch, TrendChartAfterFix.png, 
> TrendChartBeforeFix.png, codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Attachment: KYLIN-3672.master.002.patch

> Performance is poor when multiple queries occur in short period
> ---
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>Reporter: Zongwei Li
>Assignee: Zongwei Li
>Priority: Critical
>  Labels: patch, performance
> Fix For: v2.6.0
>
> Attachments: KYLIN-3672.master.001.patch, 
> KYLIN-3672.master.002.patch, TrendChartAfterFix.png, TrendChartBeforeFix.png, 
> codeChangedCausedThisBug.png, jstackBeforeBugFix.log
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
>  (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
> +Jstack Log Analyze+
> We use Jstack to capture thread info during performance testing. Already 
> attach one of them 'jstackBeforeBugFix.log'.
> From the log, we can found that 
> One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
> {color:#ff}*0x00048007a180*{color}
>  
>  {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
> {{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
> [}}\\\{{0x7f272e40d000}}{{]}}
>   
>  {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
>   
>  {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
>   
>  {{}}{{at 
> sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
>   
>  {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
>   
>  {{}}{{at 
> java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
>   
>  {{}}{{at 
> java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
>   
>  {{}}{{at 
> org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
>   
>  {{}}{{at 
> org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
>   
>  43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
>   
>  {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
> prio=}}{{5}} {{os_prio=}}{{0}} 

[jira] [Updated] (KYLIN-3672) Performance is poor when multiple queries occur in short period

2018-11-09 Thread Zongwei Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongwei Li updated KYLIN-3672:
--
Description: 
Hi, Kylin Team

We found one Kylin performance bug during performance tuning for our BI report 
integrate with Kylin.

 

+Background+

Our BI report show customer usage report to enterprise customers, provide 15 
usage charts in report page.

Each chart need send API request to Kylin with different SQLs. So it means for 
one user, it will trigger 15 API calls(by JDBC) to Kylin.

For our product scale, we need support at least 20 users to review the report 
at same time for each Kylin query node.

So it means each Kylin node should be able to handle 15 * 20 = 300 queries  per 
second.

 

+Performance Report+

To reduce the network impact. We built up Kylin cluster and testing machine in 
the same network with Hadoop system.

We use gatling and Jmeter tools to do several round testing, result as follow.

 
|Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
Response Time
 (ms)|
|1|773|13|77|
|15|3245|54|279|
|25|3844|64|390|
|50|4912|82|612|
|75|5405|90|841|
|100|5436|91|1108|
|150|5434|91|1688|

 

And draw the trend chart as follow:

!TrendChartBeforeFix.png!

 

+Conclusion+

>From the trend, when the thread count reach 75, the handled queries per second 
>reaches peak data 90, and cannot improved by increase the thread count.

Each Kylin query engine can handle 90 queries per second, it means only support 
90/15 = 6 users to review report page at same time.

Even we setup 3 query nodes, can extend to 18 users at same time, this 
performance capacity cannot meet our business requirement.

 

+Analyze+

>From test result, response for one thread is fast, but as the thread increase, 
>throughput of Kylin not increased as we expected.

We have full code review for Kylin query engine, and use Jstack and JProfile to 
do analyze, found the root cause for this performance bottleneck.

This is one regression bug introduced by new feature involved one year before.

With bug fixing, one Kylin node can handle 350+ queries per second. Submit this 
bug for contribute patch to Kylin.

+Jstack Log Analyze+

We use Jstack to capture thread info during performance testing. Already attach 
one of them 'jstackBeforeBugFix.log'.

>From the log, we can found that 

One thread locked at sun.misc.URLClassPath.getNextLoader. TID is 
{color:#ff}*0x00048007a180*{color}

 
 {{"Query e9c44a2d-6226-ff3b-f984-ce8489107d79-3425"}} {{#}}{{3425}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x0472b000}} 
{{nid=}}{{0x1433}} {{waiting }}{{for}} {{monitor entry 
[}}\\\{{0x7f272e40d000}}{{]}}
  
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
  
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
  
 {{}}{{- locked <}}{{0x00048007a180}}{{> (a sun.misc.URLClassPath)}}
  
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
  
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
  
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
  
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
  
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
  
 {{}}{{at 
org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:}}{{1666}}{{)}}
  
 {{}}{{at 
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:}}{{338}}{{)}}
  
 43 threads waiting to lock <{color:#ff}*0x00048007a180*{color}> 
  
 {{"Query f1f0bbec-a3f7-04b2-1ac6-fd3e03a0232d-4002"}} {{#}}{{4002}} {{daemon 
prio=}}{{5}} {{os_prio=}}{{0}} {{tid=}}{{0x7f27e71e7800}} 
{{nid=}}{{0x1676}} {{waiting }}{{for}} {{monitor entry 
[}}\\\{{0x7f279f503000}}{{]}}
 {{   }}{{java.lang.Thread.State: BLOCKED (on object monitor)}}
 {{}}{{at 
sun.misc.URLClassPath.getNextLoader(URLClassPath.java:}}{{469}}{{)}}
 {{}}{{- waiting to lock <}}{{0x00048007a180}}{{> (a 
sun.misc.URLClassPath)}}
 {{}}{{at 
sun.misc.URLClassPath.findResource(URLClassPath.java:}}{{214}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{569}}{{)}}
 {{}}{{at 
java.net.URLClassLoader$}}{{2}}{{.run(URLClassLoader.java:}}{{567}}{{)}}
 {{}}{{at java.security.AccessController.doPrivileged(Native Method)}}
 {{}}{{at 
java.net.URLClassLoader.findResource(URLClassLoader.java:}}{{566}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1096}}{{)}}
 {{}}{{at java.lang.ClassLoader.getResource(ClassLoader.java:}}{{1091}}{{)}}
 {{}}{{at 

[GitHub] DingqianZhou commented on a change in pull request #342: typos and format fixed

2018-11-09 Thread GitBox
DingqianZhou commented on a change in pull request #342: typos and format fixed
URL: https://github.com/apache/kylin/pull/342#discussion_r232166771
 
 

 ##
 File path: website/_docs/install/configuration.md
 ##
 @@ -258,14 +258,14 @@ This section introduces Kylin data modeling and build 
related configuration.
 - `kylin.cube.aggrgroup.max-combination`: specifies the max combination number 
of aggregation groups. The default value is 32768.
 - `kylin.cube.aggrgroup.is-mandatory-only-valid`: whether to allow Cube 
contains only Base Cuboid. The default value is *FALSE*, set to *TRUE* when 
using Spark Cubing
 - `kylin.cube.rowkey.max-size`: specifies the maximum number of columns that 
can be set to Rowkeys. The default value is 63.
-- `kylin.cube.allow-appear-in-multiple-projects`: whether to allow a cube to 
appear in multiple projects
+- `kylin.cube.allow-appear-in-mulNotele-projects`: whether to allow a cube to 
appear in mulNotele projects
 - `kylin.cube.gtscanrequest-serialization-level`: the default value is 1
 
 
 
 ### Cube Size Estimation {#cube-estimate}
 
-Both Kylin and HBase use compression when writing to disk, so Kylin will 
multiply its original size by the ratio to estimate the size of the cube.
+Both Kylin and HBase use compression when writing to disk, so Kylin will 
mulNotely its original size by the ratio to estimate the size of the cube.
 
 Review comment:
   My mistake Sorry for the carelessness!!!
   
   Already corrected in the new commit in this pr.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services