The coprocessor thread stopped itself due to scan timeout or scan threshold(check region server log)
Hi my Kylin version is 1.5.4.1 for CDH5.7/5.8 when I query in kylin, it return a error like {cellset: null, rowTotalsLists: null, colTotalsLists: null, runtime: null,??}cellset:nullcolTotalsLists:nullerror:"IOException: org.apache.commons.httpclient.methods.PostMethod@4d54f318 failed, error code 500 and response: {"url":"http://zeus001.jp:7070/kylin/api/query","exception":"Error while executing SQL \"select \"DIM_DATE\".\"DATE_ID\" as \"c0\", \"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL1_ID\" as \"c1\", \"V_DIM_BRAND\".\"BRAND_RANK\" as \"c2\", \"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\" as \"c3\",count(distinct \"FCT_ORDR_PATH_OLAP\".\"GU_ID\") as \"m0\", count(distinct \"FCT_ORDR_PATH_OLAP\".\"PAY_USER_ID\") as \"m1\" from \"FCT_ORDR_PATH_OLAP\" as \"FCT_ORDR_PATH_OLAP\" join \"DIM_DATE\" as \"DIM_DATE\" on \"FCT_ORDR_PATH_OLAP\".\"DATE_ID\" = \"DIM_DATE\".\"DATE_ID\" join \"V_DIM_CATE_LEVEL2\" as \"V_DIM_CATE_LEVEL2\" on \"FCT_ORDR_PATH_OLAP\".\"CATE_LEVEL2_ID\" = \"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL2_ID\" join \"V_DIM_BRAND\" as \"V_DIM_BRAND\" on \"FCT_ORDR_PATH_OLAP\".\"BRAND_ID\" = \"V_DIM_BRAND\".\"BRAND_ID\" join \"DIM_GOODS_TYPE\" as \"DIM_GOODS_TYPE\" on \"FCT_ORDR_PATH_OLAP\".\"GOODS_TYPE_ID\" = \"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\" where \"DIM_DATE\".\"DATE_ID\" in ('2016-10-28', '2016-10-29', '2016-10-30', '2016-10-31')group by \"DIM_DATE\".\"DATE_ID\", \"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL1_ID\", \"V_DIM_BRAND\".\"BRAND_RANK\", \"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\"\": The coprocessor thread stopped itself due to scan timeout or scan threshold(check region server log), failing current query..."}"height:nullleftOffset:0query:nullrowTotalsLists:nullruntime:nulltopOffset:0width:null and the HttpRequest only 11s,and I set the hbase.rpc.timeout = 180sit's the same ! I want Konw what the reason of the error ? thanks!
Re: 13 Step Name: Load HFile to HBase Table
This folder is empty /kylin/kylin_metadata/kylin-1ee8b8f5-6708-43db-8285-e70b9181ca79/fact_smpl_big_cube/hfile/F1/ what could cause that ? -- View this message in context: http://apache-kylin.74782.x6.nabble.com/13-Step-Name-Load-HFile-to-HBase-Table-tp6144p6148.html Sent from the Apache Kylin mailing list archive at Nabble.com.
[jira] [Created] (KYLIN-2148) ActiveMq server not responding after some time
Vidya Sagar D created KYLIN-2148: Summary: ActiveMq server not responding after some time Key: KYLIN-2148 URL: https://issues.apache.org/jira/browse/KYLIN-2148 Project: Kylin Issue Type: Bug Components: Web Reporter: Vidya Sagar D Assignee: Zhong,Jason -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2147) Move the creation of HTable after cube be built
Shaofeng SHI created KYLIN-2147: --- Summary: Move the creation of HTable after cube be built Key: KYLIN-2147 URL: https://issues.apache.org/jira/browse/KYLIN-2147 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Shaofeng SHI Assignee: Shaofeng SHI Priority: Minor Fix For: v1.6.0 In 1.5.x, the creation of HBase table is just after creating dictionaries and before cube building; That is unnecessary as cube building doesn't operate the table; It can be deferred to after the cube be built and before converting to HFile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 13 Step Name: Load HFile to HBase Table
I am using. kylin 1.6 ubuntu 14.04 hbase 1.1.1 hive 1.2 -- View this message in context: http://apache-kylin.74782.x6.nabble.com/13-Step-Name-Load-HFile-to-HBase-Table-tp6144p6145.html Sent from the Apache Kylin mailing list archive at Nabble.com.
13 Step Name: Load HFile to HBase Table
I am facing this issue when I try to build a cube. Any ideas about what is going on ? java.io.IOException: BulkLoad encountered an unrecoverable problem at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:510) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:331) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1025) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.storage.hbase.steps.BulkLoadJob.run(BulkLoadJob.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:115) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:115) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=5, exceptions: Wed Nov 02 12:25:15 GMT+08:00 2016, RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5}, java.io.IOException: Call to qtausc-pphd0107.quantium.com.au.local/192.168.81.107:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9, waitTime=60001, operationTimeout=6 expired. Wed Nov 02 12:26:24 GMT+08:00 2016, RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5}, java.io.IOException: Call to qtausc-pphd0107.quantium.com.au.local/192.168.81.107:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=11, waitTime=60001, operationTimeout=6 expired. Wed Nov 02 12:26:39 GMT+08:00 2016, RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5}, java.io.FileNotFoundException: java.io.FileNotFoundException: /kylin/kylin_metadata/kylin-78027250-cc2b-4174-82eb-df1a50aa3f0c/simple_fact_smpl_cube_clone4/hfile/F1/645ac77fe9c5441b9941f31bcdb58e3a at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:236) at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:960) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803) at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:98) at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:79) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:519) at org.apache.hadoop.hbase.regionserver.HStore.assertBulkLoadHFileOk(HStore.java:718) at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5094) at org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:1835) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32207) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Wed Nov 02 12:27:09 GMT+08:00 2016, RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5}, java.io.FileNotFoundException: java.io.FileNotFoundException: /kylin/kylin_metadata/kylin-78027250-cc2b-4174-82eb-df1a50aa3f0c/simple_fact_smpl_cube_clone4/hfile/F1/645ac77fe9c5441b9941f31bcdb58e3a at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:236) at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:960) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803) at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:98) at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:79) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:519) at org.
回复: Exceed scan threshold at 10000001
Hi shaofeng I read this article, but i also do not know why limit 10001 should scan 10,000,001 more rows? Does not pre-calculate? when do pre-calculate,it should scan 10001 rows. limit 1 and 10001 should be the same? I only want test the kylin and want to konw the theory of the Kylin. I think even if there is no where clause in my query, when kylin do pre-calculate, limit 10001 should scan 10001 rows, it do not matter about where/filter clause , data balance and high cardinal dimensions? In mysql select * from table limit 10001, it should scan 10001 rows, I think kylin do pre-calculate, it should scan 10001 rows in kylin. Do i say right? The order of dimensions in sql group is the same with the order in dimensions and rowkeys of the cude. Query select count(1) from lineorder group by LO_CUSTKEY,LO_PARTKEY LIMIT 1 select count(1) from lineorder group by LO_CUSTKEY,LO_PARTKEY LIMIT 10001 My cude Dimensions order: LO_CUSTKEY, LO_PARTKEY(only two dimensions) count(distinct LO_CUSTKEY): 20 count(distinct LO_PARTKEY): 60 Measures: count(1) Aggregation Groups: Includes: ["LO_CUSTKEY","LO_PARTKEY"] The other parameters is null Rowkeys ID Column Encoding LengthShard By 1 LO_CUSTKEY dict0 false 2 LO_PARTKEY dict0 false Thanks! -- 原始邮件 -- 发件人: "ShaoFeng Shi";; 发送时间: 2016年11月1日(星期二) 晚上10:41 收件人: "dev"; 主题: Re: Exceed scan threshold at 1001 Hi Lei, When you create the cube, in the "Advanced settings" step, the rowkey is composed by the dimension columns; The sequence can be adjusted by drag & drop; Usually we suggest to move the columns that used as filter (in "where" condition") be ahead of other columns, so they can be used to narrow down the scan range in HBase; You can look through this presentation (may need VPN): http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin 2016-11-01 18:43 GMT+08:00 Alberto Ramón : > Sorry, The pictures can't arrive you OK > > See this: > http://www.slideshare.net/HBaseCon/apache-kylins- > performance-boost-from-apache-hbase#9 > > 2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>: > > > You say Kylin is "smart" when compose Hbase Row Key, there is something i > > can not see. Could you send again How Hbase Row Key compose? > > > > > > > > > > -- 原始邮件 -- > > 发件人: "a.ramonportoles";; > > 发送时间: 2016年10月28日(星期五) 晚上6:51 > > 收件人: "dev"; > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > Q1: If i query select count(1) from table group by letter,number limit > > 2,it should scan the first two rows(letter number Agg groups)? > > > > > > A1: Kylin build Hbase Key with Dimensions: > > > > > > Kylin is "smart" when compose Hbase Row Key: > > Is not the same Group by / filter by Dim1 that Dim3 :) > > > > Dim1: Range scan--> you read that you need --> fast > > > > Dim3: full scan --> you read more rows that you need --> slow > > > > > > how to solve it? (I think:) you can build several cubes / uses different > > aggregation groups on same project > > > > > > > > > > > > Q2: when i query select count(1) from table group by letter limit 2,it > > should scan the two rows(letter Agg group) > > > > > > A2: Yes,, if you define count(1) as measure and letter as Dim, you will > > have a pre-calculated results > > > > > > > > Also: check the cardinaliy of your data, Isn't normal: > > > > limit 1 --> scan 1000 rows > > > > limit 10001 ---> scan millions of rows > > > > If this is true your data isn't balanced, I don't know any solution for > > this > > > > > > Alb > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>: > > kylin do put pre-calculate results in hbase, if cude desc is below: > > Dimensions:letter number > > Measures:count > > in hbase the result is > > count letter number > > 1 A1 > > 1 A2 > > 1 B1 > > 1 B2 > > 1 B3 > > 1 B4 > > count letter > > 2 A > > 4 B > > If i query select count(1) from table group by letter,number limit 2,it > > should scan the first two rows(letter number Agg groups)? > > when i query select count(1) from table group by letter limit 2,it > should > > scan the two rows(letter Agg group) > > Do i say right? > > > > > > -- 原始邮件 -- > > 发件人: "a.ramonportoles";; > > 发送时间: 2016年10月28日(星期五) 下午3:43 > > 收件人: "dev"; > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > hu > > but are you using "group by LO_CUSTKEY,LO_PARTKEY" > > > > And limit apply to final result no to scan rows > > > > Example: > > table with two columns Letter / Number > > A:1 > > A:2 > > B:1 > > B:2 > > B:3 > > B:4 > > > > select count (1), Letter from TB group by Letter limit 1 > > Result: 2:A > > Scans 2 rows > > > >
[jira] [Created] (KYLIN-2146) "Streaming Cluster" page should remove "Margin" inputbox
Shaofeng SHI created KYLIN-2146: --- Summary: "Streaming Cluster" page should remove "Margin" inputbox Key: KYLIN-2146 URL: https://issues.apache.org/jira/browse/KYLIN-2146 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v1.6.0 Reporter: Shaofeng SHI Assignee: Guohui LI Priority: Minor Fix For: v1.6.0 The "margin" is not needed in the new version of streaming; please remove all its occurances from UI; besides, please move the "Parser Setting" above "Advanced Setting". See attachement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
答复: How to filter dimension from look-up table
Hi shaofeng, In recently version,1.5.3, kylin support it, I can put dimension from look-up table into Filter on Data Model's "setting" page, but cube ran error when built at step 1 or step 2, telling us it can't find the dimension ,such as " invalid table alias or column reference". After upgrading to 1.5.4.1, I haven't try it , I'll try it later, then tell the answer. -邮件原件- 发件人: ShaoFeng Shi [mailto:shaofeng...@apache.org] 发送时间: 2016年11月1日 22:46 收件人: dev 主题: Re: How to filter dimension from look-up table hi zhihua, "can only put dimensions from fact table into filter fields": do you mean the "Filter" attribute on Data Model's "setting" page? It can support look-up table I think. Can you provide a sample? 2016-11-01 16:30 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) < huzhihua...@pingan.com.cn>: > Hi all, >Our analyst are faced a problem. When she built a cube, she > wanted to filter dimensions from look-up table. > But KYLIN don’t let it, we can only put dimensions from fact table > into filter fields. Therefore I can only set dimensions from look-up > table which I want to filter as mandatory dimension, which will > increase the cardinality. > >So my question is how I can filter dimensions from look-up > table when building cube. Thank you all. > > > > > > > > > > The information in this email is confidential and may be legally > privileged. If you have received this email in error or are not the > intended recipient, please immediately notify the sender and delete > this message from your computer. Any use, distribution, or copying of > this email other than by the intended recipient is strictly > prohibited. All messages sent to and from us may be monitored to > ensure compliance with internal policies and to protect our business. > Emails are not secure and cannot be guaranteed to be error free as > they can be intercepted, amended, lost or destroyed, or contain > viruses. Anyone who communicates with us by email is taken to accept these > risks. > > 收发邮件者请注意: > 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。 > 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。 > > > -- Best regards, Shaofeng Shi 史少锋 The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. 收发邮件者请注意: 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
Re: Unable to connect to Kylin Web UI
Thank you so much. I got it. I am able to login kylin now. -- View this message in context: http://apache-kylin.74782.x6.nabble.com/Unable-to-connect-to-Kylin-Web-UI-tp6036p6140.html Sent from the Apache Kylin mailing list archive at Nabble.com.
Re: How to filter dimension from look-up table
hi zhihua, "can only put dimensions from fact table into filter fields": do you mean the "Filter" attribute on Data Model's "setting" page? It can support look-up table I think. Can you provide a sample? 2016-11-01 16:30 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) < huzhihua...@pingan.com.cn>: > Hi all, >Our analyst are faced a problem. When she built a cube, she wanted > to filter dimensions from look-up table. > But KYLIN don’t let it, we can only put dimensions from fact table into > filter fields. Therefore I can only set dimensions from look-up table which > I want to filter as mandatory dimension, which will increase the > cardinality. > >So my question is how I can filter dimensions from look-up table > when building cube. Thank you all. > > > > > > > > > > The information in this email is confidential and may be legally > privileged. If you have received this email in error or are not the > intended recipient, please immediately notify the sender and delete this > message from your computer. Any use, distribution, or copying of this email > other than by the intended recipient is strictly prohibited. All messages > sent to and from us may be monitored to ensure compliance with internal > policies and to protect our business. > Emails are not secure and cannot be guaranteed to be error free as they > can be intercepted, amended, lost or destroyed, or contain viruses. Anyone > who communicates with us by email is taken to accept these risks. > > 收发邮件者请注意: > 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。 > 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。 > > > -- Best regards, Shaofeng Shi 史少锋
Re: Exceed scan threshold at 10000001
Hi Lei, When you create the cube, in the "Advanced settings" step, the rowkey is composed by the dimension columns; The sequence can be adjusted by drag & drop; Usually we suggest to move the columns that used as filter (in "where" condition") be ahead of other columns, so they can be used to narrow down the scan range in HBase; You can look through this presentation (may need VPN): http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin 2016-11-01 18:43 GMT+08:00 Alberto Ramón : > Sorry, The pictures can't arrive you OK > > See this: > http://www.slideshare.net/HBaseCon/apache-kylins- > performance-boost-from-apache-hbase#9 > > 2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>: > > > You say Kylin is "smart" when compose Hbase Row Key, there is something i > > can not see. Could you send again How Hbase Row Key compose? > > > > > > > > > > -- 原始邮件 -- > > 发件人: "a.ramonportoles";; > > 发送时间: 2016年10月28日(星期五) 晚上6:51 > > 收件人: "dev"; > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > Q1: If i query select count(1) from table group by letter,number limit > > 2,it should scan the first two rows(letter number Agg groups)? > > > > > > A1: Kylin build Hbase Key with Dimensions: > > > > > > Kylin is "smart" when compose Hbase Row Key: > > Is not the same Group by / filter by Dim1 that Dim3 :) > > > > Dim1: Range scan--> you read that you need --> fast > > > > Dim3: full scan --> you read more rows that you need --> slow > > > > > > how to solve it? (I think:) you can build several cubes / uses different > > aggregation groups on same project > > > > > > > > > > > > Q2: when i query select count(1) from table group by letter limit 2,it > > should scan the two rows(letter Agg group) > > > > > > A2: Yes,, if you define count(1) as measure and letter as Dim, you will > > have a pre-calculated results > > > > > > > > Also: check the cardinaliy of your data, Isn't normal: > > > > limit 1 --> scan 1000 rows > > > > limit 10001 ---> scan millions of rows > > > > If this is true your data isn't balanced, I don't know any solution for > > this > > > > > > Alb > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>: > > kylin do put pre-calculate results in hbase, if cude desc is below: > > Dimensions:letter number > > Measures:count > > in hbase the result is > > count letter number > > 1 A1 > > 1 A2 > > 1 B1 > > 1 B2 > > 1 B3 > > 1 B4 > > count letter > > 2 A > > 4 B > > If i query select count(1) from table group by letter,number limit 2,it > > should scan the first two rows(letter number Agg groups)? > > when i query select count(1) from table group by letter limit 2,it > should > > scan the two rows(letter Agg group) > > Do i say right? > > > > > > -- 原始邮件 -- > > 发件人: "a.ramonportoles";; > > 发送时间: 2016年10月28日(星期五) 下午3:43 > > 收件人: "dev"; > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > hu > > but are you using "group by LO_CUSTKEY,LO_PARTKEY" > > > > And limit apply to final result no to scan rows > > > > Example: > > table with two columns Letter / Number > > A:1 > > A:2 > > B:1 > > B:2 > > B:3 > > B:4 > > > > select count (1), Letter from TB group by Letter limit 1 > > Result: 2:A > > Scans 2 rows > > > > select count (1), Letter from TB group by Letter limit 2 > > Result: 2:A > > 4:B > > Scans 2 +4 rows > > > > > > Alb > > > > > > > > 2016-10-28 8:33 GMT+02:00 张磊 <121762...@qq.com>: > > > > > Query1:select count(1),sum(LO_REVENUE) from lineorder group by > > > LO_CUSTKEY,LO_PARTKEY > > > LIMIT 1 > > > > > > > > > I find it scan 1 rows from HBase > > > > > > > > > Query2: select count(1),sum(LO_REVENUE) from lineorder group by > > > LO_CUSTKEY,LO_PARTKEY > > > LIMIT 10001 > > > > > > > > > I find it scan 1001 rows from Hbase > > > > > > > > > I do not know why? Should not scan 10001 row? > > > > > > > > > The two query i scan the same HTable KYLIN_78ROC49NQY > > > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY > > > > > > > > > > > > > > > -- 原始邮件 -- > > > 发件人: "ShaoFeng Shi";; > > > 发送时间: 2016年10月28日(星期五) 中午11:20 > > > 收件人: "dev"; > > > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > > > > > Alberto, thanks for your explaination, you got the points and is > > already an > > > Kylin expert I believe. > > > > > > In order to protect HBase and Kylin from crashing by bad queries > (which > > > scan too many rows), Kylin add this mechnisam to interrupt when reach > > some > > > threshold. Usually in an OLAP scenario, the result wouldn't be too > > large. > > > This is also a reminder for user to rethink the design; If you really > > want > > > to get the thr
Re: Exceed scan threshold at 10000001
Sorry, The pictures can't arrive you OK See this: http://www.slideshare.net/HBaseCon/apache-kylins-performance-boost-from-apache-hbase#9 2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>: > You say Kylin is "smart" when compose Hbase Row Key, there is something i > can not see. Could you send again How Hbase Row Key compose? > > > > > -- 原始邮件 -- > 发件人: "a.ramonportoles";; > 发送时间: 2016年10月28日(星期五) 晚上6:51 > 收件人: "dev"; > > 主题: Re: Exceed scan threshold at 1001 > > > > Q1: If i query select count(1) from table group by letter,number limit > 2,it should scan the first two rows(letter number Agg groups)? > > > A1: Kylin build Hbase Key with Dimensions: > > > Kylin is "smart" when compose Hbase Row Key: > Is not the same Group by / filter by Dim1 that Dim3 :) > > Dim1: Range scan--> you read that you need --> fast > > Dim3: full scan --> you read more rows that you need --> slow > > > how to solve it? (I think:) you can build several cubes / uses different > aggregation groups on same project > > > > > > Q2: when i query select count(1) from table group by letter limit 2,it > should scan the two rows(letter Agg group) > > > A2: Yes,, if you define count(1) as measure and letter as Dim, you will > have a pre-calculated results > > > > Also: check the cardinaliy of your data, Isn't normal: > > limit 1 --> scan 1000 rows > > limit 10001 ---> scan millions of rows > > If this is true your data isn't balanced, I don't know any solution for > this > > > Alb > > > > > > > > > > > > > > > > 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>: > kylin do put pre-calculate results in hbase, if cude desc is below: > Dimensions:letter number > Measures:count > in hbase the result is > count letter number > 1 A1 > 1 A2 > 1 B1 > 1 B2 > 1 B3 > 1 B4 > count letter > 2 A > 4 B > If i query select count(1) from table group by letter,number limit 2,it > should scan the first two rows(letter number Agg groups)? > when i query select count(1) from table group by letter limit 2,it should > scan the two rows(letter Agg group) > Do i say right? > > > -- 原始邮件 -- > 发件人: "a.ramonportoles";; > 发送时间: 2016年10月28日(星期五) 下午3:43 > 收件人: "dev"; > > 主题: Re: Exceed scan threshold at 1001 > > > > hu > but are you using "group by LO_CUSTKEY,LO_PARTKEY" > > And limit apply to final result no to scan rows > > Example: > table with two columns Letter / Number > A:1 > A:2 > B:1 > B:2 > B:3 > B:4 > > select count (1), Letter from TB group by Letter limit 1 > Result: 2:A > Scans 2 rows > > select count (1), Letter from TB group by Letter limit 2 > Result: 2:A > 4:B > Scans 2 +4 rows > > > Alb > > > > 2016-10-28 8:33 GMT+02:00 张磊 <121762...@qq.com>: > > > Query1:select count(1),sum(LO_REVENUE) from lineorder group by > > LO_CUSTKEY,LO_PARTKEY > > LIMIT 1 > > > > > > I find it scan 1 rows from HBase > > > > > > Query2: select count(1),sum(LO_REVENUE) from lineorder group by > > LO_CUSTKEY,LO_PARTKEY > > LIMIT 10001 > > > > > > I find it scan 1001 rows from Hbase > > > > > > I do not know why? Should not scan 10001 row? > > > > > > The two query i scan the same HTable KYLIN_78ROC49NQY > > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY > > > > > > > > > > -- 原始邮件 -- > > 发件人: "ShaoFeng Shi";; > > 发送时间: 2016年10月28日(星期五) 中午11:20 > > 收件人: "dev"; > > > > 主题: Re: Exceed scan threshold at 1001 > > > > > > > > Alberto, thanks for your explaination, you got the points and is > already an > > Kylin expert I believe. > > > > In order to protect HBase and Kylin from crashing by bad queries (which > > scan too many rows), Kylin add this mechnisam to interrupt when reach > some > > threshold. Usually in an OLAP scenario, the result wouldn't be too > large. > > This is also a reminder for user to rethink the design; If you really > want > > to get the threshold be enlarged, you can allocate more memory to Kylin > and > > set "kylin.query.mem.budget" to bigger value. > > > > 2016-10-27 18:39 GMT+08:00 Alberto Ramón : > > > > > NOTE: I'm not a expert on Kylin ;) > > > > > > Where is mandatory? No > > > Where is recommended? yes > > > Where bypass the threshold? No, I think this limit is hardcoded ¿? > > > > > > The real question must be: why this limit exists ?: (opinion) > > > - The target of Kylin is Real / Near RT, limit rows --> limit response > > time > > > - If Your are using JDBC, this is not a good option by performance > > > - Protect the HBase Coprocesor > > > - Perhaps you need a new Dim, to precalculate This Aggregate or > filter by > > > this new Dim > > > > > > For Extra-Large queries, you can also check: > > > -kylin.query.mem.budget= 3GB > > > -hbase.server.scanner.max.re
[jira] [Created] (KYLIN-2145) StorageCleanupJob will fail when beeline enabled
hongbin ma created KYLIN-2145: - Summary: StorageCleanupJob will fail when beeline enabled Key: KYLIN-2145 URL: https://issues.apache.org/jira/browse/KYLIN-2145 Project: Kylin Issue Type: Bug Reporter: hongbin ma Assignee: hongbin ma due to beeline output format -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2144) move useful operation tools to org.apache.kylin.tool
hongbin ma created KYLIN-2144: - Summary: move useful operation tools to org.apache.kylin.tool Key: KYLIN-2144 URL: https://issues.apache.org/jira/browse/KYLIN-2144 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma due to historical reasons, the following four operation tools: StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, CubeMigrationCheckCLI locates in org.apache.kylin.storage.hbase.util, which brings dependency issues and other concerns. In 1.6.0 and later, we'll move the four tools to org.apache.kylin.tool. The old java class will mark as deprecated, and no longer under maintainance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
How to filter dimension from look-up table
Hi all, Our analyst are faced a problem. When she built a cube, she wanted to filter dimensions from look-up table. But KYLIN don’t let it, we can only put dimensions from fact table into filter fields. Therefore I can only set dimensions from look-up table which I want to filter as mandatory dimension, which will increase the cardinality. So my question is how I can filter dimensions from look-up table when building cube. Thank you all. The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. 收发邮件者请注意: 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
[jira] [Created] (KYLIN-2143) allow more options from Extended Columns,COUNT_DISTINCT,RAW_TABLE
Zhong,Jason created KYLIN-2143: -- Summary: allow more options from Extended Columns,COUNT_DISTINCT,RAW_TABLE Key: KYLIN-2143 URL: https://issues.apache.org/jira/browse/KYLIN-2143 Project: Kylin Issue Type: Improvement Affects Versions: v1.5.4.1 Reporter: Zhong,Jason Assignee: Zhong,Jason Fix For: Future allow more options from Extended Column On Fact Table,COUNT_DISTINCT,RAW_TABLE Extended Column On Fact Table -- options from Model Dimensions COUNT_DISTINCT -- options from Model Dimensions&Measures RAW_TABLE -- options from Model Dimensions&Measures -- This message was sent by Atlassian JIRA (v6.3.4#6332)