The coprocessor thread stopped itself due to scan timeout or scan threshold(check region server log)

2016-11-01 Thread ????????
Hi
   my Kylin version is 1.5.4.1 for CDH5.7/5.8


  when I query in kylin, it return a error  like 


{cellset: null, rowTotalsLists: null, colTotalsLists: null, runtime: 
null,??}cellset:nullcolTotalsLists:nullerror:"IOException: 
org.apache.commons.httpclient.methods.PostMethod@4d54f318 failed, error code 
500 and response: 
{"url":"http://zeus001.jp:7070/kylin/api/query","exception":"Error while 
executing SQL \"select \"DIM_DATE\".\"DATE_ID\" as \"c0\", 
\"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL1_ID\" as \"c1\", 
\"V_DIM_BRAND\".\"BRAND_RANK\" as \"c2\", \"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\" 
as \"c3\",count(distinct \"FCT_ORDR_PATH_OLAP\".\"GU_ID\") as \"m0\", 
count(distinct \"FCT_ORDR_PATH_OLAP\".\"PAY_USER_ID\") as \"m1\" from 
\"FCT_ORDR_PATH_OLAP\" as \"FCT_ORDR_PATH_OLAP\" join \"DIM_DATE\" as 
\"DIM_DATE\" on \"FCT_ORDR_PATH_OLAP\".\"DATE_ID\" = \"DIM_DATE\".\"DATE_ID\" 
join \"V_DIM_CATE_LEVEL2\" as \"V_DIM_CATE_LEVEL2\" on 
\"FCT_ORDR_PATH_OLAP\".\"CATE_LEVEL2_ID\" = 
\"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL2_ID\" join \"V_DIM_BRAND\" as 
\"V_DIM_BRAND\" on \"FCT_ORDR_PATH_OLAP\".\"BRAND_ID\" = 
\"V_DIM_BRAND\".\"BRAND_ID\" join \"DIM_GOODS_TYPE\" as \"DIM_GOODS_TYPE\" on 
\"FCT_ORDR_PATH_OLAP\".\"GOODS_TYPE_ID\" = \"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\" 
where \"DIM_DATE\".\"DATE_ID\" in ('2016-10-28', '2016-10-29', 
'2016-10-30', '2016-10-31')group by \"DIM_DATE\".\"DATE_ID\", 
\"V_DIM_CATE_LEVEL2\".\"CATE_LEVEL1_ID\", \"V_DIM_BRAND\".\"BRAND_RANK\", 
\"DIM_GOODS_TYPE\".\"GOODS_TYPE_ID\"\":
   The coprocessor thread stopped 
itself due to scan timeout or scan threshold(check region server log), failing 
current 
query..."}"height:nullleftOffset:0query:nullrowTotalsLists:nullruntime:nulltopOffset:0width:null

and the HttpRequest only 11s,and I set the hbase.rpc.timeout = 180sit's 
the same !
I want Konw what the reason of the error ?  
  
thanks!

Re: 13 Step Name: Load HFile to HBase Table

2016-11-01 Thread mnagy
This folder is empty 

/kylin/kylin_metadata/kylin-1ee8b8f5-6708-43db-8285-e70b9181ca79/fact_smpl_big_cube/hfile/F1/

what could cause that ?

--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/13-Step-Name-Load-HFile-to-HBase-Table-tp6144p6148.html
Sent from the Apache Kylin mailing list archive at Nabble.com.


[jira] [Created] (KYLIN-2148) ActiveMq server not responding after some time

2016-11-01 Thread Vidya Sagar D (JIRA)
Vidya Sagar D created KYLIN-2148:


 Summary: ActiveMq server not responding after some time
 Key: KYLIN-2148
 URL: https://issues.apache.org/jira/browse/KYLIN-2148
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Reporter: Vidya Sagar D
Assignee: Zhong,Jason






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2147) Move the creation of HTable after cube be built

2016-11-01 Thread Shaofeng SHI (JIRA)
Shaofeng SHI created KYLIN-2147:
---

 Summary: Move the creation of HTable after cube be built
 Key: KYLIN-2147
 URL: https://issues.apache.org/jira/browse/KYLIN-2147
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Shaofeng SHI
Assignee: Shaofeng SHI
Priority: Minor
 Fix For: v1.6.0


In 1.5.x, the creation of HBase table is just after creating dictionaries and 
before cube building; That is unnecessary as cube building doesn't operate the 
table; It can be deferred to after the cube be built and before converting to 
HFile. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 13 Step Name: Load HFile to HBase Table

2016-11-01 Thread mnagy
I am using.

kylin 1.6
ubuntu 14.04
hbase 1.1.1
hive 1.2

--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/13-Step-Name-Load-HFile-to-HBase-Table-tp6144p6145.html
Sent from the Apache Kylin mailing list archive at Nabble.com.


13 Step Name: Load HFile to HBase Table

2016-11-01 Thread mnagy
I am facing this issue when I try to build a cube.
Any ideas about what is going on ?

java.io.IOException: BulkLoad encountered an unrecoverable problem
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:510)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:441)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:331)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1025)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.storage.hbase.steps.BulkLoadJob.run(BulkLoadJob.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:115)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:115)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=5, exceptions:
Wed Nov 02 12:25:15 GMT+08:00 2016,
RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5},
java.io.IOException: Call to
qtausc-pphd0107.quantium.com.au.local/192.168.81.107:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9,
waitTime=60001, operationTimeout=6 expired.
Wed Nov 02 12:26:24 GMT+08:00 2016,
RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5},
java.io.IOException: Call to
qtausc-pphd0107.quantium.com.au.local/192.168.81.107:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=11,
waitTime=60001, operationTimeout=6 expired.
Wed Nov 02 12:26:39 GMT+08:00 2016,
RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5},
java.io.FileNotFoundException: java.io.FileNotFoundException:
/kylin/kylin_metadata/kylin-78027250-cc2b-4174-82eb-df1a50aa3f0c/simple_fact_smpl_cube_clone4/hfile/F1/645ac77fe9c5441b9941f31bcdb58e3a
at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:236)
at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:960)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:98)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:79)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:519)
at
org.apache.hadoop.hbase.regionserver.HStore.assertBulkLoadHFileOk(HStore.java:718)
at
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5094)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:1835)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32207)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Wed Nov 02 12:27:09 GMT+08:00 2016,
RpcRetryingCaller{globalStartTime=1478060655121, pause=3000, retries=5},
java.io.FileNotFoundException: java.io.FileNotFoundException:
/kylin/kylin_metadata/kylin-78027250-cc2b-4174-82eb-df1a50aa3f0c/simple_fact_smpl_cube_clone4/hfile/F1/645ac77fe9c5441b9941f31bcdb58e3a
at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:236)
at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:960)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:803)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:98)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:79)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:519)
at
org.

回复: Exceed scan threshold at 10000001

2016-11-01 Thread 张磊
Hi shaofeng


I read this article, but i also do not know why limit 10001 should scan 
10,000,001 more rows? Does not pre-calculate? when do pre-calculate,it should 
scan 10001 rows. limit 1 and 10001 should be the same?


I only want test the kylin and want to konw the theory of the Kylin.


I think even if there is no where clause in my query, when kylin do 
pre-calculate, limit 10001 should scan 10001 rows, it do not matter about 
where/filter clause , data balance and high cardinal dimensions?


In mysql select * from table limit 10001, it should scan 10001 rows, I think 
kylin do pre-calculate, it should scan 10001 rows in kylin.
Do i say right?


The order of dimensions in sql group is the same with the order in dimensions 
and rowkeys of the cude.


Query
select count(1) from lineorder group by LO_CUSTKEY,LO_PARTKEY LIMIT 1
select count(1) from lineorder group by LO_CUSTKEY,LO_PARTKEY LIMIT 10001


My cude
Dimensions order: LO_CUSTKEY, LO_PARTKEY(only two dimensions)
count(distinct LO_CUSTKEY): 20
count(distinct LO_PARTKEY): 60
Measures: count(1)
Aggregation Groups:
Includes:   ["LO_CUSTKEY","LO_PARTKEY"]
The other parameters is null


Rowkeys
ID  Column Encoding   LengthShard By
1   LO_CUSTKEY   dict0   false
2   LO_PARTKEY   dict0   false




Thanks!


-- 原始邮件 --
发件人: "ShaoFeng Shi";;
发送时间: 2016年11月1日(星期二) 晚上10:41
收件人: "dev"; 

主题: Re: Exceed scan threshold at 1001



Hi Lei,

When you create the cube, in the "Advanced settings" step, the rowkey is
composed by the dimension columns; The sequence can be adjusted by drag &
drop; Usually we suggest to move the columns that used as filter (in
"where" condition") be ahead of other columns, so they can be used to
narrow down the scan range in HBase; You can look through this presentation
(may need VPN):
http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin

2016-11-01 18:43 GMT+08:00 Alberto Ramón :

> Sorry, The pictures can't arrive you OK
>
> See this:
> http://www.slideshare.net/HBaseCon/apache-kylins-
> performance-boost-from-apache-hbase#9
>
> 2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>:
>
> > You say Kylin is "smart" when compose Hbase Row Key, there is something i
> > can not see. Could you send again How Hbase Row Key compose?
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "a.ramonportoles";;
> > 发送时间: 2016年10月28日(星期五) 晚上6:51
> > 收件人: "dev";
> >
> > 主题: Re: Exceed scan threshold at 1001
> >
> >
> >
> > Q1: If i query select count(1) from table group by letter,number limit
> > 2,it  should scan the first two rows(letter number Agg groups)?
> >
> >
> > A1: Kylin build Hbase Key with Dimensions:
> >
> >
> > Kylin is "smart" when compose Hbase Row Key:
> > Is not the same Group by / filter by Dim1 that Dim3   :)
> >
> > Dim1: Range scan--> you read that you need --> fast
> >
> > Dim3: full scan --> you read more rows that you need --> slow
> >
> >
> > how to solve it?  (I think:) you can build several cubes / uses different
> > aggregation groups  on same project
> >
> >
> >
> >
> >
> > Q2: when i query select count(1) from table group by letter limit 2,it
> > should scan the two rows(letter Agg group)
> >
> >
> > A2: Yes,, if you define count(1) as measure and letter as Dim, you will
> > have a pre-calculated results
> >
> >
> >
> > Also: check the cardinaliy of your data, Isn't normal:
> >
> > limit 1  --> scan 1000 rows
> >
> > limit 10001  ---> scan millions of rows
> >
> > If this is true your data isn't balanced, I don't know any solution for
> > this
> >
> >
> > Alb
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>:
> > kylin do put pre-calculate results in hbase, if cude desc is below:
> >  Dimensions:letter number
> >  Measures:count
> >  in hbase the result is
> >  count letter number
> >  1 A1
> >  1 A2
> >  1 B1
> >  1 B2
> >  1 B3
> >  1 B4
> >  count  letter
> >  2  A
> >  4  B
> >  If i query select count(1) from table group by letter,number limit 2,it
> > should scan the first two rows(letter number Agg groups)?
> >  when i query select count(1) from table group by letter limit 2,it
> should
> > scan the two rows(letter Agg group)
> >  Do i say right?
> >
> >
> >  -- 原始邮件 --
> >  发件人: "a.ramonportoles";;
> >  发送时间: 2016年10月28日(星期五) 下午3:43
> >  收件人: "dev";
> >
> >  主题: Re: Exceed scan threshold at 1001
> >
> >
> >
> >  hu
> >  but are you using "group by LO_CUSTKEY,LO_PARTKEY"
> >
> >  And limit apply to final result no to scan rows
> >
> >  Example:
> >  table with two columns Letter / Number
> >  A:1
> >  A:2
> >  B:1
> >  B:2
> >  B:3
> >  B:4
> >
> >  select count (1), Letter from TB group by Letter limit 1
> > Result: 2:A
> > Scans 2 rows
> >
> > 

[jira] [Created] (KYLIN-2146) "Streaming Cluster" page should remove "Margin" inputbox

2016-11-01 Thread Shaofeng SHI (JIRA)
Shaofeng SHI created KYLIN-2146:
---

 Summary: "Streaming Cluster" page should remove "Margin" inputbox
 Key: KYLIN-2146
 URL: https://issues.apache.org/jira/browse/KYLIN-2146
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v1.6.0
Reporter: Shaofeng SHI
Assignee: Guohui LI
Priority: Minor
 Fix For: v1.6.0


The "margin" is not needed in the new version of streaming; please remove all 
its occurances from UI; besides, please move the "Parser Setting" above 
"Advanced Setting". See attachement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


答复: How to filter dimension from look-up table

2016-11-01 Thread 万里通科技及数据中心商务智能团队数据分析组
Hi shaofeng,

   In recently version,1.5.3, kylin support it, I can put dimension from 
look-up table into Filter on Data Model's "setting" page, but cube ran error 
when built at step 1 or step 2, telling us it can't find the dimension ,such as 
" invalid table alias or column reference".

   After upgrading to 1.5.4.1, I haven't try it , I'll try it later, then tell 
the answer.

-邮件原件-
发件人: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
发送时间: 2016年11月1日 22:46
收件人: dev
主题: Re: How to filter dimension from look-up table

hi zhihua,

"can only put dimensions from fact table into filter fields": do you mean the 
"Filter" attribute on Data Model's "setting" page? It can support look-up table 
I think. Can you provide a sample?

2016-11-01 16:30 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
huzhihua...@pingan.com.cn>:

> Hi all,
>Our analyst are faced a problem. When she built a cube, she 
> wanted to filter dimensions from look-up table.
> But KYLIN don’t let it, we can only put dimensions from fact table 
> into filter fields. Therefore I can only set dimensions from look-up 
> table which I want to filter as mandatory dimension, which will 
> increase the cardinality.
>
>So my question is how I can filter dimensions from look-up 
> table when building cube. Thank you all.
>
>
>
>
>
>
>
> 
> 
> The information in this email is confidential and may be legally 
> privileged. If you have received this email in error or are not the 
> intended recipient, please immediately notify the sender and delete 
> this message from your computer. Any use, distribution, or copying of 
> this email other than by the intended recipient is strictly 
> prohibited. All messages sent to and from us may be monitored to 
> ensure compliance with internal policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as 
> they can be intercepted, amended, lost or destroyed, or contain 
> viruses. Anyone who communicates with us by email is taken to accept these 
> risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> 
> 
>



--
Best regards,

Shaofeng Shi 史少锋


The information in this email is confidential and may be legally privileged. If 
you have received this email in error or are not the intended recipient, please 
immediately notify the sender and delete this message from your computer. Any 
use, distribution, or copying of this email other than by the intended 
recipient is strictly prohibited. All messages sent to and from us may be 
monitored to ensure compliance with internal policies and to protect our 
business.
Emails are not secure and cannot be guaranteed to be error free as they can be 
intercepted, amended, lost or destroyed, or contain viruses. Anyone who 
communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。



Re: Unable to connect to Kylin Web UI

2016-11-01 Thread BigdataGR
Thank you so much.

I got it. I am able to login kylin now.

--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/Unable-to-connect-to-Kylin-Web-UI-tp6036p6140.html
Sent from the Apache Kylin mailing list archive at Nabble.com.


Re: How to filter dimension from look-up table

2016-11-01 Thread ShaoFeng Shi
hi zhihua,

"can only put dimensions from fact table into filter fields": do you mean
the "Filter" attribute on Data Model's "setting" page? It can support
look-up table I think. Can you provide a sample?

2016-11-01 16:30 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
huzhihua...@pingan.com.cn>:

> Hi all,
>Our analyst are faced a problem. When she built a cube, she wanted
> to filter dimensions from look-up table.
> But KYLIN don’t let it, we can only put dimensions from fact table into
> filter fields. Therefore I can only set dimensions from look-up table which
> I want to filter as mandatory dimension, which will increase the
> cardinality.
>
>So my question is how I can filter dimensions from look-up table
> when building cube. Thank you all.
>
>
>
>
>
>
>
> 
> 
> The information in this email is confidential and may be legally
> privileged. If you have received this email in error or are not the
> intended recipient, please immediately notify the sender and delete this
> message from your computer. Any use, distribution, or copying of this email
> other than by the intended recipient is strictly prohibited. All messages
> sent to and from us may be monitored to ensure compliance with internal
> policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as they
> can be intercepted, amended, lost or destroyed, or contain viruses. Anyone
> who communicates with us by email is taken to accept these risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> 
> 
>



-- 
Best regards,

Shaofeng Shi 史少锋


Re: Exceed scan threshold at 10000001

2016-11-01 Thread ShaoFeng Shi
Hi Lei,

When you create the cube, in the "Advanced settings" step, the rowkey is
composed by the dimension columns; The sequence can be adjusted by drag &
drop; Usually we suggest to move the columns that used as filter (in
"where" condition") be ahead of other columns, so they can be used to
narrow down the scan range in HBase; You can look through this presentation
(may need VPN):
http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin

2016-11-01 18:43 GMT+08:00 Alberto Ramón :

> Sorry, The pictures can't arrive you OK
>
> See this:
> http://www.slideshare.net/HBaseCon/apache-kylins-
> performance-boost-from-apache-hbase#9
>
> 2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>:
>
> > You say Kylin is "smart" when compose Hbase Row Key, there is something i
> > can not see. Could you send again How Hbase Row Key compose?
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "a.ramonportoles";;
> > 发送时间: 2016年10月28日(星期五) 晚上6:51
> > 收件人: "dev";
> >
> > 主题: Re: Exceed scan threshold at 1001
> >
> >
> >
> > Q1: If i query select count(1) from table group by letter,number limit
> > 2,it  should scan the first two rows(letter number Agg groups)?
> >
> >
> > A1: Kylin build Hbase Key with Dimensions:
> >
> >
> > Kylin is "smart" when compose Hbase Row Key:
> > Is not the same Group by / filter by Dim1 that Dim3   :)
> >
> > Dim1: Range scan--> you read that you need --> fast
> >
> > Dim3: full scan --> you read more rows that you need --> slow
> >
> >
> > how to solve it?  (I think:) you can build several cubes / uses different
> > aggregation groups  on same project
> >
> >
> >
> >
> >
> > Q2: when i query select count(1) from table group by letter limit 2,it
> > should scan the two rows(letter Agg group)
> >
> >
> > A2: Yes,, if you define count(1) as measure and letter as Dim, you will
> > have a pre-calculated results
> >
> >
> >
> > Also: check the cardinaliy of your data, Isn't normal:
> >
> > limit 1  --> scan 1000 rows
> >
> > limit 10001  ---> scan millions of rows
> >
> > If this is true your data isn't balanced, I don't know any solution for
> > this
> >
> >
> > Alb
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>:
> > kylin do put pre-calculate results in hbase, if cude desc is below:
> >  Dimensions:letter number
> >  Measures:count
> >  in hbase the result is
> >  count letter number
> >  1 A1
> >  1 A2
> >  1 B1
> >  1 B2
> >  1 B3
> >  1 B4
> >  count  letter
> >  2  A
> >  4  B
> >  If i query select count(1) from table group by letter,number limit 2,it
> > should scan the first two rows(letter number Agg groups)?
> >  when i query select count(1) from table group by letter limit 2,it
> should
> > scan the two rows(letter Agg group)
> >  Do i say right?
> >
> >
> >  -- 原始邮件 --
> >  发件人: "a.ramonportoles";;
> >  发送时间: 2016年10月28日(星期五) 下午3:43
> >  收件人: "dev";
> >
> >  主题: Re: Exceed scan threshold at 1001
> >
> >
> >
> >  hu
> >  but are you using "group by LO_CUSTKEY,LO_PARTKEY"
> >
> >  And limit apply to final result no to scan rows
> >
> >  Example:
> >  table with two columns Letter / Number
> >  A:1
> >  A:2
> >  B:1
> >  B:2
> >  B:3
> >  B:4
> >
> >  select count (1), Letter from TB group by Letter limit 1
> > Result: 2:A
> > Scans 2 rows
> >
> >  select count (1), Letter from TB group by Letter limit 2
> > Result: 2:A
> > 4:B
> > Scans 2 +4 rows
> >
> >
> >  Alb
> >
> >
> >
> >  2016-10-28 8:33 GMT+02:00 张磊 <121762...@qq.com>:
> >
> >  > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> >  > LO_CUSTKEY,LO_PARTKEY
> >  > LIMIT 1
> >  >
> >  >
> >  > I find it scan 1 rows from HBase
> >  >
> >  >
> >  > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> >  > LO_CUSTKEY,LO_PARTKEY
> >  > LIMIT 10001
> >  >
> >  >
> >  > I find it scan 1001 rows from Hbase
> >  >
> >  >
> >  > I do not know why?  Should not  scan 10001 row?
> >  >
> >  >
> >  > The two query i scan the same HTable KYLIN_78ROC49NQY
> >  > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
> >  >
> >  >
> >  >
> >  >
> >  > -- 原始邮件 --
> >  > 发件人: "ShaoFeng Shi";;
> >  > 发送时间: 2016年10月28日(星期五) 中午11:20
> >  > 收件人: "dev";
> >  >
> >  > 主题: Re: Exceed scan threshold at 1001
> >  >
> >  >
> >  >
> >  > Alberto, thanks for your explaination, you got the points and is
> > already an
> >  > Kylin expert I believe.
> >  >
> >  > In order to protect HBase and Kylin from crashing by bad queries
> (which
> >  > scan too many rows), Kylin add this mechnisam to interrupt when reach
> > some
> >  > threshold. Usually in an OLAP scenario, the result wouldn't be too
> > large.
> >  > This is also a reminder for user to rethink the design; If you really
> > want
> >  > to get the thr

Re: Exceed scan threshold at 10000001

2016-11-01 Thread Alberto Ramón
Sorry, The pictures can't arrive you OK

See this:
http://www.slideshare.net/HBaseCon/apache-kylins-performance-boost-from-apache-hbase#9

2016-11-01 7:52 GMT+01:00 张磊 <121762...@qq.com>:

> You say Kylin is "smart" when compose Hbase Row Key, there is something i
> can not see. Could you send again How Hbase Row Key compose?
>
>
>
>
> -- 原始邮件 --
> 发件人: "a.ramonportoles";;
> 发送时间: 2016年10月28日(星期五) 晚上6:51
> 收件人: "dev";
>
> 主题: Re: Exceed scan threshold at 1001
>
>
>
> Q1: If i query select count(1) from table group by letter,number limit
> 2,it  should scan the first two rows(letter number Agg groups)?
>
>
> A1: Kylin build Hbase Key with Dimensions:
>
>
> Kylin is "smart" when compose Hbase Row Key:
> Is not the same Group by / filter by Dim1 that Dim3   :)
>
> Dim1: Range scan--> you read that you need --> fast
>
> Dim3: full scan --> you read more rows that you need --> slow
>
>
> how to solve it?  (I think:) you can build several cubes / uses different
> aggregation groups  on same project
>
>
>
>
>
> Q2: when i query select count(1) from table group by letter limit 2,it
> should scan the two rows(letter Agg group)
>
>
> A2: Yes,, if you define count(1) as measure and letter as Dim, you will
> have a pre-calculated results
>
>
>
> Also: check the cardinaliy of your data, Isn't normal:
>
> limit 1  --> scan 1000 rows
>
> limit 10001  ---> scan millions of rows
>
> If this is true your data isn't balanced, I don't know any solution for
> this
>
>
> Alb
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 2016-10-28 10:01 GMT+02:00 张磊 <121762...@qq.com>:
> kylin do put pre-calculate results in hbase, if cude desc is below:
>  Dimensions:letter number
>  Measures:count
>  in hbase the result is
>  count letter number
>  1 A1
>  1 A2
>  1 B1
>  1 B2
>  1 B3
>  1 B4
>  count  letter
>  2  A
>  4  B
>  If i query select count(1) from table group by letter,number limit 2,it
> should scan the first two rows(letter number Agg groups)?
>  when i query select count(1) from table group by letter limit 2,it should
> scan the two rows(letter Agg group)
>  Do i say right?
>
>
>  -- 原始邮件 --
>  发件人: "a.ramonportoles";;
>  发送时间: 2016年10月28日(星期五) 下午3:43
>  收件人: "dev";
>
>  主题: Re: Exceed scan threshold at 1001
>
>
>
>  hu
>  but are you using "group by LO_CUSTKEY,LO_PARTKEY"
>
>  And limit apply to final result no to scan rows
>
>  Example:
>  table with two columns Letter / Number
>  A:1
>  A:2
>  B:1
>  B:2
>  B:3
>  B:4
>
>  select count (1), Letter from TB group by Letter limit 1
> Result: 2:A
> Scans 2 rows
>
>  select count (1), Letter from TB group by Letter limit 2
> Result: 2:A
> 4:B
> Scans 2 +4 rows
>
>
>  Alb
>
>
>
>  2016-10-28 8:33 GMT+02:00 张磊 <121762...@qq.com>:
>
>  > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
>  > LO_CUSTKEY,LO_PARTKEY
>  > LIMIT 1
>  >
>  >
>  > I find it scan 1 rows from HBase
>  >
>  >
>  > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
>  > LO_CUSTKEY,LO_PARTKEY
>  > LIMIT 10001
>  >
>  >
>  > I find it scan 1001 rows from Hbase
>  >
>  >
>  > I do not know why?  Should not  scan 10001 row?
>  >
>  >
>  > The two query i scan the same HTable KYLIN_78ROC49NQY
>  > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
>  >
>  >
>  >
>  >
>  > -- 原始邮件 --
>  > 发件人: "ShaoFeng Shi";;
>  > 发送时间: 2016年10月28日(星期五) 中午11:20
>  > 收件人: "dev";
>  >
>  > 主题: Re: Exceed scan threshold at 1001
>  >
>  >
>  >
>  > Alberto, thanks for your explaination, you got the points and is
> already an
>  > Kylin expert I believe.
>  >
>  > In order to protect HBase and Kylin from crashing by bad queries (which
>  > scan too many rows), Kylin add this mechnisam to interrupt when reach
> some
>  > threshold. Usually in an OLAP scenario, the result wouldn't be too
> large.
>  > This is also a reminder for user to rethink the design; If you really
> want
>  > to get the threshold be enlarged, you can allocate more memory to Kylin
> and
>  > set "kylin.query.mem.budget" to bigger value.
>  >
>  > 2016-10-27 18:39 GMT+08:00 Alberto Ramón :
>  >
>  > > NOTE: I'm not a expert on Kylin  ;)
>  > >
>  > > Where is mandatory? No
>  > > Where is recommended? yes
>  > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
>  > >
>  > > The real question must be: why this limit exists ?: (opinion)
>  > > - The target of Kylin is Real / Near RT, limit rows --> limit response
>  > time
>  > > - If Your are using JDBC, this is not a good option by performance
>  > > - Protect the HBase Coprocesor
>  > > - Perhaps you need a new Dim, to precalculate This Aggregate or
> filter by
>  > > this new Dim
>  > >
>  > > For Extra-Large queries, you can also check:
>  > >  -kylin.query.mem.budget= 3GB
>  > >  -hbase.server.scanner.max.re

[jira] [Created] (KYLIN-2145) StorageCleanupJob will fail when beeline enabled

2016-11-01 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-2145:
-

 Summary: StorageCleanupJob will fail when beeline enabled
 Key: KYLIN-2145
 URL: https://issues.apache.org/jira/browse/KYLIN-2145
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


due to beeline output format



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2144) move useful operation tools to org.apache.kylin.tool

2016-11-01 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-2144:
-

 Summary: move useful operation tools to org.apache.kylin.tool
 Key: KYLIN-2144
 URL: https://issues.apache.org/jira/browse/KYLIN-2144
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


due to historical reasons, the following four operation tools:

StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, CubeMigrationCheckCLI

locates in  org.apache.kylin.storage.hbase.util, which brings dependency issues 
and other concerns. 

In 1.6.0 and later, we'll move the four tools to org.apache.kylin.tool. The old 
java class will mark as deprecated, and no longer under maintainance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


How to filter dimension from look-up table

2016-11-01 Thread 万里通科技及数据中心商务智能团队数据分析组
Hi all,
   Our analyst are faced a problem. When she built a cube, she wanted to 
filter dimensions from look-up table.
But KYLIN don’t let it, we can only put dimensions from fact table into filter 
fields. Therefore I can only set dimensions from look-up table which I want to 
filter as mandatory dimension, which will increase the cardinality.

   So my question is how I can filter dimensions from look-up table when 
building cube. Thank you all.








The information in this email is confidential and may be legally privileged. If 
you have received this email in error or are not the intended recipient, please 
immediately notify the sender and delete this message from your computer. Any 
use, distribution, or copying of this email other than by the intended 
recipient is strictly prohibited. All messages sent to and from us may be 
monitored to ensure compliance with internal policies and to protect our 
business.
Emails are not secure and cannot be guaranteed to be error free as they can be 
intercepted, amended, lost or destroyed, or contain viruses. Anyone who 
communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。



[jira] [Created] (KYLIN-2143) allow more options from Extended Columns,COUNT_DISTINCT,RAW_TABLE

2016-11-01 Thread Zhong,Jason (JIRA)
Zhong,Jason created KYLIN-2143:
--

 Summary: allow more options from Extended 
Columns,COUNT_DISTINCT,RAW_TABLE
 Key: KYLIN-2143
 URL: https://issues.apache.org/jira/browse/KYLIN-2143
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.4.1
Reporter: Zhong,Jason
Assignee: Zhong,Jason
 Fix For: Future


allow more options from Extended Column On Fact Table,COUNT_DISTINCT,RAW_TABLE

Extended Column On Fact Table -- options from Model Dimensions
COUNT_DISTINCT -- options from Model Dimensions&Measures
RAW_TABLE -- options from Model Dimensions&Measures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)