Re: Pinot/Kylin/Druid quick comparision

2024-03-13 Thread Li Yang
Nam,

We are planning to release a kylin5-beta around March or April. The GA of
kylin5 would be around July this year if everything goes well.

Cheers
Yang

On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy  wrote:

> Hello Xiaoxiang,
>
> How are you, my boss is very interested in Kylin 5. so he would like to
> know when Kylin 5 will be released...could you please provide an
> estimation?
>
> Thank you very much and best regards
>
>
>
>
>
> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy  wrote:
>
> > Good morning Xiaoxiang, hope you are well
> >
> > 1. JDBC source is a feature which in development, it will be supported
> > later.
> >
> > ===
> >
> > May I know when will the JDBC be available? as well as is there any
> change
> > in Kylin 5 release date
> >
> > Thank you and best regards
> >
> >
> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu  wrote:
> >
> >> 1. JDBC source is a feature which in development, it will be supported
> >> later.
> >>
> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> >> (I will let you know.)
> >>
> >> 3. I think ranger and Kerberos are not doing the same things, one for
> >> authentication, one for authorization. So they cannot replace each
> other.
> >> Ranger can integrate with Kerberos, please check ranger's website for
> >> information.
> >>
> >> 
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy 
> wrote:
> >>
> >> > Thank you Xiaoxiang for your reply
> >> >
> >> > -
> >> > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > -
> >> > Yes: please answer to help me clear this headache:
> >> >
> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
> >> If
> >> > not then do we have any work around?
> >> >
> >> > 2. My team is using kerberos for authentication, do you have any
> >> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> >> 5.x
> >> >
> >> > 3. Should we use apache ranger instead of kerberos for authentication
> >> and
> >> > for security purposes?
> >> >
> >> > Thank you again
> >> >
> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu  wrote:
> >> >
> >> > > I guess the release date should be 2024/01 .
> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > >
> >> > > 
> >> > > With warm regard
> >> > > Xiaoxiang Yu
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy 
> >> > wrote:
> >> > >
> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
> >> > already
> >> > >> so there is no time for you to comment. Next time I will send you
> in
> >> > >> advance. The meeting result was that we will implement both druid
> and
> >> > >> kylin
> >> > >> in the next couple of projects because of its realtime feature.
> Hope
> >> > that
> >> > >> kylin will have same feature soon.
> >> > >>
> >> > >> May I ask when will you release kylin 5.0?
> >> > >>
> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu 
> wrote:
> >> > >>
> >> > >> > Since 2018 there are a lot of new features and code refactor.
> >> > >> > If you like, you can share your ppt to me privately, maybe I can
> >> > >> > give some comments.
> >> > >> >
> >> > >> > Here is the reference of advantages of Kylin since 2018:
> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > >> > -
> >> > >> >
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> > >> >
> >> > >> > 
> >> > >> > With warm regard
> >> > >> > Xiaoxiang Yu
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy  >
> >> > >> wrote:
> >> > >> >
> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
> and
> >> > >> Druid in
> >> > >> >> my team.
> >> > >> >>
> >> > >> >> I found this article and would like you to update me the
> >> advantages
> >> > of
> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
> >> released)
> >> > >> >>
> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> >> 2)?
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> > >> >> >
> >> > >> >>
> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy 
> wrote:
> >> > >> >>
> >> > >> >> > Thank you very much for your prompt response, I still have
> >> several
> >> > >> >> > questions to seek for your help later.
> >> > >> >> >
> >> > >> >> > Best regards and have a good day
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu 
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Done. Github branch changed to kylin5.
> >> > >> >> >>
> >> > >> >> >> 

CVE-2023-29055: Apache Kylin: Insufficiently protected credentials in config file

2024-01-29 Thread Li Yang
Severity: low

Affected versions:

- Apache Kylin 2.0.0 through 4.0.3

Description:

In Apache Kylin version 2.0.0 to 4.0.3, there is a Server Config web interface 
that displays the content of file 'kylin.properties', that may contain 
serverside credentials. When the kylin service runs over HTTP (or other plain 
text protocol), it is possible for network sniffers to hijack the HTTP payload 
and get access to the content of kylin.properties and potentially the 
containing credentials.

To avoid this threat, users are recommended to 

  *  Always turn on HTTPS so that network payload is encrypted.

  *  Avoid putting credentials in kylin.properties, or at least not in plain 
text.
  *  Use network firewalls to protect the serverside such that it is not 
accessible to external attackers.

  *  Upgrade to version Apache Kylin 4.0.4, which filters out the sensitive 
content that goes to the Server Config web interface.

Credit:

Li Jiakun <2839549...@qq.com> (reporter)

References:

https://kylin.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-29055



[Announce] Apache Kylin 4.0.4 released

2024-01-28 Thread Li Yang
The Apache Kylin team is pleased to announce the immediate availability of
the 4.0.4 release.

This is a minor release with 5 small improvements.
All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/download/

Apache Kylin is an open-source Distributed Analytical Data Warehouse for
Big Data; it was designed to provide OLAP (Online Analytical Processing)
capability in the big data era. By renovating the multi-dimensional cube
and precalculation technology on Hadoop and Spark, Kylin is able to achieve
near-constant query speed regardless of the ever-growing data volume.
Reducing query latency from minutes to sub-second, Kylin brings online
analytics back to big data.

Apache Kylin lets you query billions of rows at sub-second latency in 3
steps:
1. Identify a Star/Snowflake Schema on Hadoop.
2. Build Cube from the identified tables.
3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC or
RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/


Regards
Yang


[DISCUSS] The future of Apache Kylin

2022-01-10 Thread Li Yang
Hi All

Apache Kylin has been stable for quite a while and it may be a good time to
think about the future of it. Below are thoughts from my team and myself.
Love to hear yours as well. Ideas and comments are very welcome.  :-)

*APACHE KYLIN TODAY*

Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
Parquet to replace HBase as storage engine, so as to improve file scanning
performance. At the same time, Kylin 4.0 reimplements the spark based build
engine and query engine, making it possible to separate computing and
storage, and better adapt to the technology trend of cloud native. Kylin
4.0 comprehensively updated the build and query engine, realized the
deployment mode without Hadoop dependency, decreasing the complexity of
deployment. However, Kylin also has a lot to improve, such as the ability
of business semantic layer needs to be strengthened and the modification of
model/cube is not flexible. With these, we thinking a few things to do:

   - Multi-dimensional query ability friendly to non-technical personnel.
   Multi-dimensional model is the key to distinguish Kylin from the general
   OLAP engines. The feature is that the model concept based on dimension and
   measurement is more friendly to non-technical personnel and closer to the
   goal of citizen analyst. The multi-dimensional query capability that
   non-technical personnel can use should be the new focus of Kylin
   technology.


   - Native Engine. The query engine of Kylin still has much room for
   improvement in vector acceleration and cpu instruction level optimization.
   The Spark community Kylin relies on also has a strong demand for native
   engine. It is optimistic that native engine can improve the performance of
   Kylin by at least three times, which is worthy of investment.


   - More cloud native capabilities. Kylin 4.0 has only completed the
   initial cloud deployment and realized the features of rapid deployment and
   dynamic resource scaling on the cloud, but there are still many cloud
   native capabilities to be developed.

More explanations are following.

*KYLIN AS A MULTI-DIMENSIONAL DATABASE*

The core of Kylin is a multi-dimensional database, which is a special OLAP
engine. Although Kylin has always had the ability of a relational database
since its birth, and it is often compared with other relational OLAP
engines, what really makes Kylin different is multi-dimensional model and
multi-dimensional database ability. Considering the essence of Kylin and
its wide range of business uses in the future (not only technical uses),
positioning Kylin as a multi-dimensional database makes perfect sense. With
business semantics and precomputation technology, Apache Kylin helps
non-technical people understand and afford big data, and realizes data
democratization.

*THE SEMANTIC LAYER*

The key difference between the multi-dimensional database and the
relational database is business expression ability. Although SQL has strong
expression ability and is the basic skill of data analysts, SQL and the RDB
are still too difficult for non-technical personnel if we aim at "everyone
is a data analyst". From the perspective of non-technical personnel, the
data lake and data warehouse are like a dark room. They know that there is
a lot of data, but they can't see clearly, understand and use this data
because they don't understand database theory and SQL.

How to make the Data Lake (and data warehouse) clear to non-technical
personnel? This requires introducing a more friendly data model for
non-technical personnel — multi-dimensional data model. While the
relational model describes the technical form of data, the
multi-dimensional model describes the business form of data. In a MDB,
measurement corresponds to business indicators that everyone understands,
and dimension is the perspective of comparing and observing these business
indicators. Compare KPI with last month and compare performance between
parallel business units, which are concepts understood by every
non-technical personnel. By mapping the relational model to the
multi-dimensional model, the essence is to enhance the business semantics
on the technical data, form a business semantic layer, and help
non-technical personnel understand, explore and use the data. In order to
enhance Kylin's ability as the semantic layer, supporting multi-dimensional
query language is the key content of Kylin roadmap, such as MDX and DAX.
MDX can transform the data model in Kylin into a business friendly
language, endow data with business value, and facilitate Kylin's
multi-dimensional analysis with BI tools such as Excel and Tableau.

*PRECOMPUTATION AND MODEL FLEXIBILITY*

It is kylin's unchanging mission to continue to reduce the cost of a single
query through precomputation technology so that ordinary people can afford
big data. If the multi-dimensional model solves the problem that

Re: [DISCUSS] Upgrade Kylin's dependency to Hadoop 3 / HBase 2

2020-02-26 Thread Li Yang
The proposal means Kylin 3.0 will be the last major version that supports
Hadoop 2.

What will be recommended version for Hadoop 2 users after this? I feel the
latest stable version of 2.6 is better than 3.0.

Anyway, I'm fine with moving focus to Hadoop 3. That is the direction.
However we shall also think about what it means for Hadoop 2 users.
Questions like below shall also be answered.

- What is the recommended version/branch for Hadoop 2? (Btw, 3.0 does not
sound right here.)
- How that version/branch will be maintained?

+1 in general

Regards
-Yang


On Wed, Feb 26, 2020 at 5:36 PM Zhou Kang  wrote:

> +1
>
>
> > 2020年2月26日 下午3:48,ShaoFeng Shi  写道:
> >
> > Hello, Kylin users and developers,
> >
> > As we know Hadoop 3 and HBase 2 have released for some time. Kylin
> starts to support Hadoop 3  since v2.5.0 in Sep 2018.  As the APIs of HBase
> 1 and 2 are incompatible, we need to keep different branches for them. And
> in each release, we need to build separate packages and do a round of
> testing for them separately. Furthermore, Cloudera's API difference with
> the Apache release makes the situation worse; We need to build 4 binary
> packages for reach release. That has spent much of our manual effort and
> computing resources.
> >
> > Today, Hadoop 3 + HBase 2 becomes enough mature and stable for
> production use; And we see more and more users are starting to use the new
> versions. We think it is time for Kylin to totally upgrade to the new
> version. So that we can focus more on Kylin itself, instead of environments.
> >
> >  Here is my proposal:
> > 1) From Kylin 3.1,  Hadoop/HBase version upgrades to 3.1/2.1 (or a close
> version);
> > 2) Hadoop 2 and HBase 1 users can use Kylin 3.0 and previous releases;
> > 3) We will re-evaluate the need for building binary packages for
> Cloudera release. (we may raise another discuss)
> >
> > Please let us know your comments. And please also understand with the
> limited resource we couldn't support multiple Hadoop versions...
> >
> > Thanks!
> >
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> > Apache Kylin PMC
> > Email: shaofeng...@apache.org
> >
> > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> > Join Kylin user mail group: user-subscr...@kylin.apache.org
> > Join Kylin dev mail group: dev-subscr...@kylin.apache.org
> >
> >
>
>


Re: [New blog] "Real-time Streaming Design in Apache Kylin"

2019-04-28 Thread Li Yang
Love to see this new direction!

On Mon, Apr 22, 2019 at 3:30 PM Iñigo Martínez 
wrote:

> Thank you, ShaoFeng.
>
> Very interesing. It's a more polished version of document attached at Jira
> feature request. ;)
>
> El jue., 18 abr. 2019 a las 4:28, ShaoFeng Shi ()
> escribió:
>
>> Hello,
>>
>> Gang Ma, the core developer of Kylin Real-time OLAP, just composed a tech
>> blog on this feature. It will help to understand the purpose, the
>> architecture and the design. Welcome to read and share with others:
>>
>> https://kylin.apache.org/blog/2019/04/12/rt-streaming-design/
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>
> --
>
>
>
>
> Iñigo Martínez
> Systems Manager
> imarti...@telecoming.com
>
>
>
>
>
>
>   [image: Telecoming - Make it digital]
> [image: 5000_empresas]
> [image: 1000_empresas]
>
> Paseo de la Castellana, 95. Torre Europa, pl 16. 28046 Madrid, Spain |
> telecoming.com 
>
>
>
>   Este correo electrónico y sus archivos adjuntos están dirigidos
> únicamente a la(s) dirección(es) indicada(s) anteriormente. El carácter
> confidencial, personal e intransferible del mismo está protegido
> legalmente. Cualquier publicación, reproducción, distribución o
> retransmisión no autorizada, ya sea completa o en parte, se encuentra
> prohibida. Si ha recibido este mensaje por equivocación, notifíquelo
> inmediatamente a la persona que lo ha enviado y borre el mensaje original
> junto con sus ficheros anexos sin leerlo ni grabarlo en modo alguno.
>
>


Re: [Discuss] Won't ship Spark binary in Kylin binary anymore

2019-03-09 Thread Li Yang
+1 makes sense to me

On Fri, Mar 8, 2019 at 5:55 PM Luke Han  wrote:

> +1
>
> Best Regards!
> -
>
> Luke Han
>
>
> On Fri, Mar 8, 2019 at 3:34 PM Billy Liu  wrote:
>
>> +1
>>
>> With Warm regards
>>
>> Billy Liu
>>
>>
>> Zhong, Yanghong  于2019年3月8日周五 上午11:27写道:
>>
>>> Agree to exclude spark binary.
>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>>
>>> Yanghong Zhong
>>>
>>>
>>>
>>> *From: *yuzhang 
>>> *Reply-To: *"d...@kylin.apache.org" 
>>> *Date: *Friday, March 8, 2019 at 11:26 AM
>>> *To: *"user@kylin.apache.org" 
>>> *Cc: *"d...@kylin.apache.org" 
>>> *Subject: *Re: [Discuss] Won't ship Spark binary in Kylin binary anymore
>>>
>>>
>>>
>>> Agree[image:
>>> cid:2a92ddc2$1$1695b55d9ca$Coremail$shifengdefannao$163.com]!
>>> Downloading Spark binary when pack kylin has ever made me confuse.
>>>
>>>
>>>
>>> [image: Image removed by sender.]
>>>
>>> *yuzhang*
>>>
>>> shifengdefan...@163.com
>>>
>>> 签名由 网易邮箱大师
>>> 
>>>  定制
>>>
>>>
>>> On 3/8/2019 10:42,ShaoFeng Shi
>>>  wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> As we know Kylin ships a Spark in its binary package; The total package
>>> becomes bigger and bigger as the version grows; the latest version (v2.6.1)
>>> is bigger than 350MB, which was rejected by Apache SVN server when trying
>>> to upload the new package. Among the 350MB, more than 200MB is Spark, while
>>> Spark is not mandatory for Kylin.
>>>
>>>
>>>
>>> So I would propose to exclude Spark from Kylin's binary package, from
>>> the current v2.6.1; the user just needs to point SPARK_HOME to any a folder
>>> of the expected spark version, or manually download and then put it to
>>> KYLIN_HOME/spark.  All other behaviors are not impacted.
>>>
>>>
>>>
>>> Just share your comments if any.
>>>
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Shaofeng Shi 史少锋
>>>
>>> Apache Kylin PMC
>>>
>>> Email: shaofeng...@apache.org
>>>
>>>
>>>
>>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>>> 
>>>
>>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>>>
>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>>
>>>
>>>
>>>
>>>
>>>


Re: Re: Evaluate Kylin on Parquet

2018-12-31 Thread Li Yang
>From the discussion, apparently a new storage will be added sooner or late.

Will it be a new big version of Kylin? Like Apache Kylin 3.0? Also how
about the migration from old storage? I assume old cube data has to be
transformed and loaded into the new storage.

Yang

On Sat, Dec 29, 2018 at 5:52 PM ShaoFeng Shi  wrote:

> Thanks very much for Yiming and Jiatao's comments, they're very valueable.
> There are many improvements can do for this new storage. We welcome all
> kinds of contribution and would like to improve it together with the
> community in the year of 2019!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> JiaTao Tao  于2018年12月19日周三 下午8:44写道:
>
> > Hi all,
> >
> > Truly agreed with Yiming, and here I expand a little more about
> > "Distributed computing".
> >
> > As Yiming mentioned, Kylin will parse the query into an execution plan
> > using Calcite(Kylin will change the execution plan cuz the data in cubes
> is
> > already aggregated, we cannot use the origin plan directly). It's a tree
> > structure, a node represents a specific calculation and data goes from
> > bottom to top applying all these calculations.
> > [image: image.png]
> > (Pic from https://blog.csdn.net/yu616568/article/details/50838504, a
> > really good blog.)
> >
> > At present, Kylin will do almost all these calculations only in its own
> > node, in other words, we cannot fully use the power of the cluster, and
> > it's a SPOF. And here comes a design that we can visit this tree, *and
> > transform each node into operations to Spark's Dataframes(i.e. "DF").*
> >
> > More specifically, we will visit the nodes recursively until we met the
> > "TableScan" node(like a stack pushing operation). e.g. In the above
> > diagram, the first node we met is a "Sort" node, we just visit its
> > child(ren), and we'll not stop visiting each node's child(ren) until we
> met
> > a "TableScan" node.
> >
> > In the "TableScan" node, we will generate the initial DF, then the DF
> will
> > be poped to the "Filter" node, and the "Filter" node will apply its own
> > operation like "df.filter(xxx)". Finally, we will apply each node's
> > operation to this DF, and the final call chain will like:
> > "df.filter(xxx).select(xxx).agg(xxx).sort(xxx)".
> >
> > After we got the final Dataframe and triggered the calculation, all the
> > rest were handled by Spark. And we can gain tremendous benefits in
> > computation level, more details can be seen in my previous post:
> >
> http://apache-kylin.74782.x6.nabble.com/Re-DISCUSS-Columnar-storage-engine-for-Apache-Kylin-tc12113.html
> > .
> >
> >
> > --
> >
> >
> > Regards!
> >
> > Aron Tao
> >
> >
> > 许益铭  于2018年12月19日周三 上午11:40写道:
> >
> >> hi All!
> >> 关于CHAO LONG提到的几个问题,我有以下几个看法:
> >>
> >>
> 1.当前我们的架构是分为两层的,一层是storage层,一层是计算层.在storage层,我们已经做了一些优化,在storage层做了预聚合来减少返回的数据量,但是runtime的聚合和连接发生在kylin
> >> server端,序列化无可避免,且这个架构容易导致单点瓶颈,如果runtime
> >> 的agg或join数据量比较大的话,会导致查询性能直线下降,kylin
> >> server GC严重
> >>
> >>
> >>
> 2.关于字典问题,字典是当初为了在hbase中对齐rowkey,同时也为了减少一部分的存储而引入的设计.但这也引入另外一个问题,hbase很难处理非定长的string类型的dimension,如果遇到高基的非定长dimension,往往只能去建立一个很大的字典或者给一个比较大的fixlength,导致存储翻倍,同时因为字典比较大,查询性能会受到很大影响(gc).如果我们使用列式存储,是可以不需要考虑这个问题的.
> >>
> >> 3.我们要使用parquet的page
> >>
> index,必须把tuplefilter转换成parquet的filter,这个工作量不小.而且我们的数据都是被编码过的,parquet的page
> >> index只会根据page上的min max来进行过滤,因此对于binary的数据,是无法做filter的.
> >>
> >> 我觉得使用spark来做我们的计算引擎能解决上述所有问题:
> >>
> >> 1.分布式计算
> >> sql通过calcite解析优化之后会生成olap
> >>
> >>
> rel的一颗树,而spark的catalyst也是通过解析sql生成一棵树后,自动优化成为dataframe来计算,如果calcite的plan能够转换成spark的plan,那么我们将实现分布式计算,calcite只负责解析sql和返回结果集,减少kylin
> >> server端的压力.
> >>
> >> 2.去掉字典
> >>
> >>
> 字典有个很好的作用就是在中低基数下减少储存压力,但是也有一个坏处就是其数据文件无法脱离字典单独使用,我建议刚开始可以不考虑字典类型的encoding,让系统尽可能的简单,默认使用parquet的page级别的dictionary即可.
> >>
> >> 3.parquet存储使用列的真实类型,而不是使用binary
> >>
> >>
> 如上,parquet对于binary的filter能力极弱,而使用基本类型能够直接使用spark的Vectorizedread,加速数据读取速度和计算.
> >>
> >> 4.使用spark适配parquet
> >> 当前的spark已经适配了parquet,spark的pushed
> >> filter已经被转换成为了parquet能用的filter,这里只需要升级parquet版本后稍加修改就能提供parquet的page
> >> index能力.
> >>
> >> 5.index server
> >> 就如JiaTao Tao所述,index server分为file index 和 page index ,字典的过滤无非就是file
> >> index的一种,因为我们可以在这里插入一个index server.
> >>
> >>
> >> hi,all!
> >> I have the following views:
> >> 1. At present, our architecture is divided into two layers, one is the
> >> storage layer, and the other is the computing layer. In the storage
> layer,
> >> we have made some optimizations and do pre-aggregation in the storage
> >> layer
> >> to reduce the amount of data returned. However, the aggregation and
> >> connection of the runtime occurs on the kylin server side. Serialization
> >> is
> >> inevitable, and this architecture is easy to cause a single point
> 

Re: Apache Kylin FAQ updated

2018-10-02 Thread Li Yang
Nice~~

On Wed, Sep 19, 2018 at 10:03 AM ShaoFeng Shi 
wrote:

> Hello,
>
> The FAQ page got updated together with v2.5.0 release; Please refresh to
> see the changes:
>
> https://kylin.apache.org/docs/gettingstarted/faq.html
>
> If you have a question to be added to this page, just let me know.
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


Re: Kylin integration with metabase

2018-08-16 Thread Li Yang
Hi Moisés

Open source works by volunteers. If metabase is so cool and matters to you,
maybe it's your turn to step up and do something.  :-)

Cheers
Yang

On Tue, Aug 14, 2018 at 5:44 PM Moisés Català <
moises.cat...@lacupulamusic.com> wrote:

> Hi Team,
>
> I was wondering if you are planning to create an integration with metabase
> (https://www.metabase.com/) .
>
> It’s a cool Open source BI tool
>
> Kind regards
>
> Moisés Català
> Senior Data Engineer
> La Cupula Music - Sonosuite
> T: *+34 93 250 38 05*
> www.lacupulamusic.com
>
>
>
> C/. Trafalgar, 10   Pral-1ª
> 08010 Barcelona (Spain)
>
>
>
>


Re: Kylin 2.3.1 - Error at #5 Step Name: Build Dimension Dictionary

2018-08-15 Thread Li Yang
 >
/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata_kylin_2.3.1/resources/table_snapshot/DB_RFT_RCMO_RFDA.DRRBUSINESSHIERARCHY/2d660a9f-186d-47d9-b043-1ded145433ba.snapshot


A file on HDFS should exist but not exist, that is what the error is
saying. This file is the snapshot of a lookup table. Seems your Kylin
metadata is somewhat corrupted. There are some ways to fix however none of
them is cheap and nor can they return your Kylin to full health...

If you can afford data loss, rebuild a new cube from scratch is the
simplest way out. And you should dig deeper in kylin.log to see how the
metadata becomes corrupted. I guess there must be some HDFS error earlier
in the log, maybe the HDFS is not stable at that moment etc..

On Wed, Aug 8, 2018 at 12:23 PM Kumar, Manoj H 
wrote:

> Can someone suggest on this? Whats wrong here?
>
>
>
> java.io.IOException: Failed to read resource at
> /table_snapshot/DB_RFT_RCMO_RFDA.DRRBUSINESSHIERARCHY/2d660a9f-186d-47d9-b043-1ded145433ba.snapshot
>
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.getInputStream(HBaseResourceStore.java:256)
>
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:277)
>
> at
> org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:165)
>
> at
> org.apache.kylin.dict.lookup.SnapshotManager.load(SnapshotManager.java:196)
>
> at
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByInfo(SnapshotManager.java:161)
>
> at
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:107)
>
> at
> org.apache.kylin.cube.CubeManager$DictionaryAssist.buildSnapshotTable(CubeManager.java:1055)
>
> at
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:971)
>
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:87)
>
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
>
> at
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:71)
>
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:97)
>
> at
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)
>
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.io.FileNotFoundException: File does not exist:
> /tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata_kylin_2.3.1/resources/table_snapshot/DB_RFT_RCMO_RFDA.DRRBUSINESSHIERARCHY/2d660a9f-186d-47d9-b043-1ded145433ba.snapshot
>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
>
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
>

Re: Can we move the pid file to another path?

2018-08-15 Thread Li Yang
Thanks anyway for raising and solving the question. That is how things work.

On Thu, Jul 5, 2018 at 4:32 AM 彭鱼宴 <461292...@qq.com> wrote:

> Hi all,
>
> Sorry about my stupid question. I already did it. Just add some codes that
> export the path I want. Do not need to waste your time to reply this mail.
>
> Best,
> Zhefu Peng
>


Re: Is there details of AppendTrieDictionary implementation?

2018-06-26 Thread Li Yang
Find related JIRA here: https://issues.apache.org/jira/browse/KYLIN-1705

On Tue, Jun 19, 2018 at 10:44 AM, big data  wrote:

> Hi,
>
> I found that AppendTrieDictionary has been upgrade to multi-version and
> HDFS store.
>
> 1. Is there any documents about this new implementation?
>
> 2. How to make sure that in distributed system, different string is
> mapped to different sequence number id?
>
> 3. I find it's too slow than before. How?
>
>
> Thanks.
>
>


Re: Write metric error

2018-06-25 Thread Li Yang
This seems not the root cause. Try find in kylin.log, there should be
another error ahead of this.

On Wed, Jun 13, 2018 at 7:03 PM, praveen kumar 
wrote:

> Hi,
>   I hit Query Against Cube, the Cube return output but kylin log
> below error is coming.kindly guide me.
>
> 2018-06-13 16:23:03,432 WARN  [Query 09146ac0-505e-4fa5-92e4-ed53085c2ece-76]
> util.MBeans:68 : Failed to register MBean "null"
> javax.management.RuntimeOperationsException: Exception occurred trying to
> register the MBean
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> registerDynamicMBean(DefaultMBeanServerInterceptor.java:951)
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(
> DefaultMBeanServerInterceptor.java:900)
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(
> DefaultMBeanServerInterceptor.java:324)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(
> JmxMBeanServer.java:522)
> at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(
> MetricsSourceAdapter.java:221)
> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.
> start(MetricsSourceAdapter.java:96)
> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(
> MetricsSystemImpl.java:245)
> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.
> register(MetricsSystemImpl.java:223)
> at org.apache.kylin.rest.metrics.QueryMetrics.registerWith(
> QueryMetrics.java:128)
> at org.apache.kylin.rest.metrics.QueryMetricsFacade.getQueryMetrics(
> QueryMetricsFacade.java:275)
> at org.apache.kylin.rest.metrics.QueryMetricsFacade.updateMetricsToLocal(
> QueryMetricsFacade.java:88)
> at org.apache.kylin.rest.metrics.QueryMetricsFacade.updateMetrics(
> QueryMetricsFacade.java:72)
> at org.apache.kylin.rest.service.QueryService.recordMetric(
> QueryService.java:560)
> at org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:469)
> at org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:390)
> at org.apache.kylin.rest.controller.QueryController.
> query(QueryController.java:86)
> at sun.reflect.GeneratedMethodAccessor313.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(
> InvocableHandlerMethod.java:205)
> at org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:133)
> at org.springframework.web.servlet.mvc.method.annotation.
> ServletInvocableHandlerMethod.invokeAndHandle(
> ServletInvocableHandlerMethod.java:97)
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.invokeHandlerMethod(
> RequestMappingHandlerAdapter.java:827)
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.
> java:738)
> at org.springframework.web.servlet.mvc.method.
> AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
> at org.springframework.web.servlet.DispatcherServlet.
> doDispatch(DispatcherServlet.java:967)
> at org.springframework.web.servlet.DispatcherServlet.
> doService(DispatcherServlet.java:901)
> at org.springframework.web.servlet.FrameworkServlet.processRequest(
> FrameworkServlet.java:970)
> at org.springframework.web.servlet.FrameworkServlet.
> doPost(FrameworkServlet.java:872)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:846)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:303)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:241)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:317)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at org.springframework.security.web.access.ExceptionTranslationFilter.
> doFilter(ExceptionTranslationFilter.java:114)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> 

Re: Failed to init ProjectManager from kylin_metadata@hbase -- need help

2018-06-09 Thread Li Yang
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=6,

Kylin is having problem connecting to HBase. Check there.





On Fri, Jun 8, 2018 at 11:26 PM, Uppala, Vinay 
wrote:

> Hi Team,
>
>
>
> We have been using Kylin in our environment from last 10+ months and
> running fine. We have restarted Kylin server as part of maintenance and not
> able to see any projects in Kylin, Looks like kylin not reading data from
> Hbase getting below error when we logged into KYLIN. Validate in HDFS and
> Hbase everything looks good but still no luck on Kylin side.
>
>
>
> Can you please help us here ?
>
>
>
>
>
> Error from Kylin log:
>
> 
>
> 2018-06-08 14:48:53,552 ERROR [http-bio-7070-exec-7]
> controller.BasicController:44 :
>
> java.lang.IllegalStateException: Failed to init ProjectManager from
> kylin_metadata@hbase
>
> at org.apache.kylin.metadata.project.ProjectManager.
> getInstance(ProjectManager.java:72)
>
> at org.apache.kylin.rest.service.BasicService.getProjectManager(
> BasicService.java:84)
>
> at org.apache.kylin.rest.service.ProjectService.listAllProjects(
> ProjectService.java:95)
>
> at org.apache.kylin.rest.service.ProjectService$$
> FastClassBySpringCGLIB$$8ee134e.invoke()
>
> at org.springframework.cglib.proxy.MethodProxy.invoke(
> MethodProxy.java:204)
>
> at org.springframework.aop.framework.CglibAopProxy$
> DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:629)
>
> at org.apache.kylin.rest.service.ProjectService$$
> EnhancerBySpringCGLIB$$dd2f916b.listAllProjects()
>
> at org.apache.kylin.rest.controller.ProjectController.
> getReadableProjects(ProjectController.java:87)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> doInvoke(InvocableHandlerMethod.java:221)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:136)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> ServletInvocableHandlerMethod.invokeAndHandle(
> ServletInvocableHandlerMethod.java:104)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.invokeHandleMethod(
> RequestMappingHandlerAdapter.java:743)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.
> java:672)
>
> at org.springframework.web.servlet.mvc.method.
> AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doDispatch(DispatcherServlet.java:933)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doService(DispatcherServlet.java:867)
>
> at org.springframework.web.servlet.FrameworkServlet.
> processRequest(FrameworkServlet.java:951)
>
> at org.springframework.web.servlet.FrameworkServlet.
> doGet(FrameworkServlet.java:842)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
>
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:827)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:303)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(
> WsFilter.java:52)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:241)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
>
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>
> at org.springframework.security.web.access.
> ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>
> at org.springframework.security.web.session.
> 

Re: Using Apache Kylin as data source for Spark

2018-05-25 Thread Li Yang
That is very useful~~  :-)

On Fri, May 18, 2018 at 11:56 AM, ShaoFeng Shi 
wrote:

> Hello, Kylin and Spark users,
>
> A doc is newly added in Apache Kylin website on how to using Kylin as a
> data source in Spark;
> This can help the users who want to use Spark to analysis the aggregated
> Cube data.
>
> https://kylin.apache.org/docs23/tutorial/spark.html
>
> Thanks for your attention.
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


Re: Query time hierarchy and time range in Kylin cube.

2018-05-25 Thread Li Yang
If it is possible to alter the table structure a little, the easiest way is
to encode date and hour in one column and use that column as kylin
partition column. E.g. have a new column called "date_and_hour". It takes
values like 2018051310 to represent 10 am of 2018-05-13. And your query may
go

select count(*) from fact where date_and_hour  between 2018051200 and
2018051323


Cheers
Yang

On Fri, May 18, 2018 at 10:55 AM,  wrote:

> My current approach to time hierarchy and partition is as follow:
>
> -  I have partition date and hour column in Hive table to avoid
> full Hive table scan. Column names are partition_date, partition_hour
>
> -  I have separate fields in fact table name Year, Month, Day,
> Hour and use these column as hierarchy dimensions in Kylin cube build. I
> use *dictionary encoding*.
>
> -  When I want to query time range, I have to list all
> combination of time hierarchy dimensions, for example (Month, Day), in
> order to query.
>
> My query seems to be slower when the cube get bigger with the same time
> range. So I want to ask the best practice to design time hierarchy and
> query time range in Kylin. I see some support for timestamp in Streaming
> cube but I don’t see guideline for design time dimension for normal cube
> except partition date and hour in Hive.
>
> I also suspect that my time range query get slower because it currently
> need to scan all segment.
>
> I think we need
>
>
>


Re: Error when revoking user permissions

2018-05-25 Thread Li Yang
Thanks for reporting Jose.

It sounds like a bug. Please open a JIRA with reproduce steps.

Cheers
Yang

On Wed, May 16, 2018 at 7:07 PM, José Manuel Carral <
josemanuel.car...@stratebi.com> wrote:

> Hello.
>
> I've just created a new Kylin user called "external_user" and I've given
> him management permissions on a Kylin project ("Test_project"). After that,
> I decided to revoke his permissions but he can still have access to the
> project.
>
> I´ve tried to reload the Kylin metadata and to restart Kylin in order to
> remove his permissions but nothing have worked. I have checked that this
> user maintains the last permissions given to him. In this case now he has
> query access (which is the last permission that i gave him before revoking
> all permissions).
>
> If we check the "manage project" interface, we can see (when logging on
> with "admin" user and also when logging on with this "external_user") the
> project ACL which is the following now:
> Name
> Type Access
> ADMIN User ADMIN
> ROLE_ANALYST Role QUERY
>
> The "external_user" does not appear in this list. However it still has
> query access to the project (as it was the last permission i gave him).
>
> So, any ideas about why is this happening? Thank you in advance!
>
> Kind regards,
>
> Jose M.
>


Re: Error on querying sample sales cube data

2018-05-25 Thread Li Yang
Try "select count(*) from KYLIN_SALES" instead of "select * from
KYLIN_SALES".

Kylin answers aggregated queries, not detailed queries.

On Wed, May 9, 2018 at 2:55 PM, SUDIPTA PAUL  wrote:

> Hi Kylin Team,
>
>
>
> I have tried both Kylin 2.3.1 and 2.3.0 on AWS EMR. I can build sample
> cubes – sales cube and stream cube but when running query on cube I am
> getting below error. Any help will be appreciated.
>
>
>
> org.apache.kylin.metadata.realization.NoRealizationFoundException: No
> model
> found for OLAPContext,
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_2e7caa31:DEFAULT.KYLIN_SALES.SLR_SEGMENT_CD,
> 1_2e7caa31:DEFAULT.KYLIN_SALES.ITEM_COUNT],
> rel#0:OLAPTableScan.OLAP.[](table=[DEFAULT, KYLIN_SALES],ctx=,fields=[0,
> 1,
> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
>
>
>
> Below details error log:
>
>
>
>
>
> 2018-05-09 06:14:56,721 DEBUG [http-bio-7070-exec-2]
> project.ProjectL2Cache:195 : Loading L2 project cache for learn_kylin
>
> 2018-05-09 06:14:56,721 WARN  [http-bio-7070-exec-2]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
>
> 2018-05-09 06:14:56,721 WARN  [http-bio-7070-exec-2]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
>
> 2018-05-09 06:15:08,577 DEBUG [http-bio-7070-exec-4]
> schema.OLAPSchemaFactory:123 : Adding new schema file
> olap_model_614433761197358.json to cache
>
> 2018-05-09 06:15:08,577 DEBUG [http-bio-7070-exec-4]
> schema.OLAPSchemaFactory:124 : Schema json: {
>
> "version": "1.0",
>
> "defaultSchema": "DEFAULT",
>
> "schemas": [
>
> {
>
> "type": "custom",
>
> "name": "DEFAULT",
>
> "factory": "org.apache.kylin.query.schema.OLAPSchemaFactory",
>
> "operand": {
>
> "project": "learn_kylin"
>
> },
>
> "functions": [
>
>{
>
>name: 'PERCENTILE',
>
>className:
> 'org.apache.kylin.measure.percentile.PercentileAggFunc'
>
>},
>
>{
>
>name: 'CONCAT',
>
>className: 'org.apache.kylin.query.udf.ConcatUDF'
>
>},
>
>{
>
>name: 'MASSIN',
>
>className: 'org.apache.kylin.query.udf.MassInUDF'
>
>},
>
>{
>
>name: 'INTERSECT_COUNT',
>
>className:
> 'org.apache.kylin.measure.bitmap.BitmapIntersectDistinctCountAggFunc'
>
>},
>
>{
>
>name: 'VERSION',
>
>className: 'org.apache.kylin.query.udf.VersionUDF'
>
>   },
>
>{
>
>name: 'PERCENTILE_APPROX',
>
>className:
> 'org.apache.kylin.measure.percentile.PercentileAggFunc'
>
>}
>
> ]
>
> }
>
> ]
>
> }
>
>
>
> 2018-05-09 06:15:15,653 INFO  [Scheduler 1863658236 FetcherRunner-40]
> threadpool.DefaultScheduler:268 : Job Fetcher: 0 should running, 0 actual
> running, 0 stopped, 0 ready, 3 already succeed, 0 error, 0 discarded, 0
> others
>
> 2018-05-09 06:15:18,609 INFO  [Idle-Rpc-Conn-Sweeper-pool2-t1]
> ipc.AbstractRpcClient:217 : Cleanup idle connection to
> ip-10-225-138-141.ec2.internal/10.225.138.141:16020
>
> 2018-05-09 06:15:26,353 INFO  [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] service.QueryService:428 : Using
> project: learn_kylin
>
> 2018-05-09 06:15:26,353 INFO  [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] service.QueryService:429 : The
> original query:  select * from KYLIN_SALES
>
> 2018-05-09 06:15:26,401 INFO  [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] service.QueryService:646 : The
> corrected query: select * from KYLIN_SALES
>
> LIMIT 5
>
> 2018-05-09 06:15:28,188 INFO  [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] acl.TableACLManager:58 :
> Initializing TableACLManager with config kylin_metadata@hbase
>
> 2018-05-09 06:15:28,188 DEBUG [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] cachesync.CachedCrudAssist:118 :
> Reloading TableACL from
> kylin_metadata(key='/table_acl')@kylin_metadata@hbase
>
> 2018-05-09 06:15:28,191 DEBUG [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] cachesync.CachedCrudAssist:127 :
> Loaded 0 TableACL(s) out of 0 resource
>
> 2018-05-09 06:15:28,205 INFO  [Query
> 7eb69e4e-150d-4da6-b1a8-b310a32753d2-49] util.PushDownUtil:83 : Query
> failed to utilize pre-calculation, routing to other engines
>
> java.sql.SQLException: Error while executing SQL "select * from KYLIN_SALES
>
> LIMIT 5": No model found for OLAPContext,
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_2e7caa31:DEFAULT.KYLIN_SALES.SLR_SEGMENT_CD,
> 1_2e7caa31:DEFAULT.KYLIN_SALES.ITEM_COUNT],
> rel#0:OLAPTableScan.OLAP.[](table=[DEFAULT, KYLIN_SALES],ctx=,fields=[0,
> 1,
> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
>
> at 

Re: Disable automatic cube enabling after building

2018-04-30 Thread Li Yang
Not right now, but is easy to do. You can open a JIRA to kickoff the dev
work.

Thanks
Yang

On Tue, Apr 24, 2018 at 9:25 PM,  wrote:

> Hi,
>
>
>
> Is it possible to disable automatic cube enabling after a building
> process? For example, this could be useful if we need to create a new
> version of a cube but we do not want it to be consulted until new cube has
> been properly tested.
>
>
>
> Regards,
>
> *Roberto Tardío Olmos*
>
> *Senior Big Data & Business Intelligence Consultant*
>
> Avenida de Brasil, 17
> ,
> Planta 16.28020 Madrid
>
> Fijo: 91.788.34.10
>
>
> [image:
> http://www.stratebi.com/image/layout_set_logo?img_id=21615=1486381163544]
>
>
>
> http://bigdata.stratebi.com/
>
>
>
> http://www.stratebi.com
>
>
>


Re: Make StorageLevel in Spark Cubing configurable

2018-04-30 Thread Li Yang
Sure that can be configurable. Kickoff the work by open a JIRA.

Thanks
Yang

On Tue, Apr 17, 2018 at 10:21 AM,  wrote:

> Currently in Spark cubing, the StorageLevel is set to
> StorageLevel.MEMORY_AND_DISK_SER, which will take up a lot of memory if
> the RDD of the layer is large. Can we make StorageLevel configurable ? So
> that for large cube, user can set it to Disk to avoid OOM error.
>


Re: SQL queries take up too much memory.

2018-04-30 Thread Li Yang
Maybe improve the cube such that more calculation is done offline instead
of scanning many segments and data online?

On Tue, Apr 17, 2018 at 10:09 AM, 小村庄  wrote:

> hi all:
>Some SQL will take up too much memory in the application, the
> service is not stable, after analysis because the SQL query need to scan
> the relevant segemnt corresponding hbase table, is through the thread
> asynchronous request related to SQL cuboid corresponding region. After each
> segment makes a request, wait for the data to process the hbase callbacks.
> But sometimes SQL involves more segments, and before all the segments make
> requests, there is a large amount of data returned, leading to memory alarm
> and service instability.
>


Re: [ECO] Redash Kylin Plugin

2018-04-30 Thread Li Yang
Cool~~

On Mon, Apr 16, 2018 at 9:12 PM, Luke Han  wrote:

> Nice!
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Mon, Apr 9, 2018 at 10:58 PM, Billy Liu  wrote:
>
>> Hello Kylin user,
>>
>> Found one interesting project on Github from the Strikingly contribution.
>> https://github.com/strikingly/redash-kylin.
>>
>> The description from project readme:
>> "At Strikingly we are using Apache Kylin as our BI solution to have
>> insight about multiple data sources. We are also using Redash, an
>> excellent open source dashboard service for drawing chart and
>> generating report.
>>
>> So we made this plugin to let redash connect to Kylin without
>> configuring any JDBC connections. After installed, you should be able
>> to execute SQL query, test connections and list schemas upon a Kylin
>> data source."
>>
>> Thanks for this contribution.
>>
>> With Warm regards
>>
>> Billy Liu
>>
>
>


Re: Get daily average for periodic readings

2018-03-02 Thread Li Yang
SQL window function can get the rolling weekly average. Try google "window
function rolling average". Put the date dimension on cube, plus the window
function, together give you what you want.

On Fri, Mar 2, 2018 at 12:58 AM, deva namaste  wrote:

> Thanks Alberto.  So you would recommend me to create daily one record in
> fact table? so from 6 records for year, you would recommend to create 365
> records with difference invalues between them.  So I can sort data from
> dimension based on week, month, year, etc.  But I was more worried about
> amount of data will be stored in fact table in cube.  so for 10 Million
> items, we are talking about 10 x 365 = 3650 Millions.  Do you think
> performance will be impacted? or other method where I can put only 6
> records per item in fact table, so 10 million x 6 = 60 Millions and then
> use some sql  for better performance? thanks
>
> On Thu, Mar 1, 2018 at 11:36 AM, Alberto Ramón 
> wrote:
>
>> You cant portioned your cube per week.  Must be per -mm-dd
>>
>> You can perform your own test.  Doing a calculate per year as dim and
>> year as sum of days
>>
>> On 1 Mar 2018 3:50 p.m., "deva namaste"  wrote:
>>
>>> Hi Alberto,
>>>
>>> when I was saying 6 vs 365 its for one item. for 20 Million items it
>>> will multiply by a lot.  Do you think it wont make much differnce?
>>> Also what is  YY-MM-WW ? so I can explain you? Basically I need same
>>> avg() for week, month, year, etc.
>>>
>>> Thanks
>>> Deva
>>>
>>> On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón >> > wrote:
>>>
 - the 95% of time response, are latencies (= there is no difference
 between sum one int or 365, I thought the same when I started with
 Kylin)
 - The YY-MM-WW, is not implemented, but can be nice if you can
 contribute to it

 Alb

 On 28 February 2018 at 22:59, deva namaste  wrote:

> I was thinking of saving only 6 records in kylin instead of splitting
> them outside in daily avg and adding 365 records for each item.  So is
> there anyway I can achieve using sql level in kylin or have changes to
> model to accomodate above change? Please advice. Thanks
>
> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <
> a.ramonporto...@gmail.com> wrote:
>
>> Sounds like:
>> - your minimum granularity for queries are on Weeks, your fact table
>> need be on weeks (or less, like days)
>> - you will need expand you actual fact table to weeks (or more, days)
>> Example use a hive view
>> - as extra:  Kylin can't use partition format columns on weeks, the
>> minimum es days
>>
>> Alb
>>
>> On 28 February 2018 at 21:51, deva namaste  wrote:
>>
>>> Hello,
>>>
>>> How would I calculate value for a week while I have bi-monthly
>>> values.
>>>
>>> e.g. Here is my data looks like -
>>>
>>> Date   -  Value
>>> 01/18/2017 -  100
>>> 03/27/2017 -  130  (68 Days)
>>> 05/17/2017 -  102  (51 Days)
>>>
>>> I need average value per week, as below. Lets consider between 03/27
>>> and 05/17. So total days between period are 51. so Daily average would 
>>> be
>>> 102/51= 2.04
>>>
>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>> week5 (Starting Apr 30, #days =7)= 14.28
>>> week1 (starting May 7, #days = 7)= 14.28
>>> Week2 (starting May 14, #days = 4)= 8.16
>>>
>>> But as you see that period from 01/18 to 03/27, have 68 days and
>>> daily average would be 130/68=1.91
>>>
>>> So really to get complete week I need 3 days from 130 value and 4
>>> days from 102 value.
>>>
>>> So real total for that first week would be -
>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73)
>>> = 13.89
>>>
>>> How would I achieve this in Kylin? Any function? or other method I
>>> can use?
>>> Just for 6 records for year, I dont want to populate daily records.
>>> Thanks
>>> Deva
>>>
>>>
>>>
>>
>

>>>
>


Re: Caused by: java.lang.OutOfMemoryError: unable to create new native thread

2018-02-25 Thread Li Yang
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread

This is the root cause. Google "java.lang.OutOfMemoryError: unable to
create new native thread" and you will find plenty answers.

On Thu, Feb 15, 2018 at 9:32 PM, praveen kumar 
wrote:

> Hi Team,
>
> I used apache Kylin for building cube (star Schema based) in
> cluster mode,but i have issue which i mentioned below.please guide me.
>
> Cluster detail
>
> three machine have 64GB(memory)
>
> one machine set job
>
> other two machine  set query.
>
> version -apache kylin 2.2.0(hbase)
>
>
>
> Exception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.
> YarnRuntimeException):
> java.io.IOException: Failed on local exception: java.io.IOException:
> Couldn't set up IO streams; Host Details : local host is: "hostname/
> 10.237.247.12"; destination host is: "hostname":9000;
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(
> CachedHistoryStorage.java:147)
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:217)
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$
> HSClientProtocolHandler$1.run(HistoryClientService.java:203)
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$
> HSClientProtocolHandler$1.run(HistoryClientService.java:199)
>
>  at java.security.AccessController.doPrivileged(Native Method)
>
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1548)
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$
> HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:199)
>
>  at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$
> HSClientProtocolHandler.getJobReport(HistoryClientService.java:231)
>
>  at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.
> MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.
> java:122)
>
>  at
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.
> callBlockingMethod(MRClientProtocol.java:275)
>
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(
> ProtobufRpcEngine.java:585)
>
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>
>  at java.security.AccessController.doPrivileged(Native Method)
>
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1548)
>
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> Caused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Couldn't set up IO streams; Host Details : local host
> is: "CTSINGTOHP12.cts.com/10.237.247.12"; destination host is: "
> CTSINGTOHP12.cts.com":9000;
>
>  at org.apache.hadoop.net.NetUtils.wrapException(
> NetUtils.java:764)
>
>  at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>
>  at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.
> invoke(ProtobufRpcEngine.java:206)
>
>  at com.sun.proxy.$Proxy9.getListing(Unknown Source)
>
>  at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
>  at java.lang.reflect.Method.invoke(Method.java:498)
>
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
> RetryInvocationHandler.java:190)
>
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
> RetryInvocationHandler.java:103)
>
>  at com.sun.proxy.$Proxy9.getListing(Unknown Source)
>
>  at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslat
> orPB.getListing(ClientNamenodeProtocolTranslatorPB.java:515)
>
>  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.
> java:1743)
>
>  at
> org.apache.hadoop.fs.Hdfs$DirListingIterator.(Hdfs.java:203)
>
>  at
> org.apache.hadoop.fs.Hdfs$DirListingIterator.(Hdfs.java:190)
>
>  at org.apache.hadoop.fs.Hdfs$2.(Hdfs.java:172)
>
>  at org.apache.hadoop.fs.Hdfs.listStatusIterator(Hdfs.java:172)
>
>  at org.apache.hadoop.fs.FileContext$20.next(
> FileContext.java:1393)
>
>  at org.apache.hadoop.fs.FileContext$20.next(
> FileContext.java:1388)
>
>  at
> org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>
>  at
> org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
>
>  at
> 

Re: Kylin 2.2.0 cluster installation steps guide.

2017-12-11 Thread Li Yang
Broken is typically due to some table (or column) required by the
cube/model is gone.

On Thu, Nov 30, 2017 at 3:35 PM, Prasanna 
wrote:

> Hi all,
>
>
>
> If anybody is using kylin 2.2.0 in cluster setup mode, please guide me how
> to setup. I tried with kylin older versions cluster setup document, I am
> able to start kylin service but its giving as ,
>
> In another node test_cube is in READY status. But in this node its
> showing as   DESCBROKEN status .
>
>
>
>


Re: [Discuss] Disable/hide "RAW" measure in Kylin web GUI

2017-11-30 Thread Li Yang
+1 for complete removal.

On Wed, Nov 29, 2017 at 1:45 PM, ShaoFeng Shi <shaofeng...@apache.org>
wrote:

> Billy, I don't see a need for "advanced user could enable it also." Do
> you have such a case?
>
> For the legacy cube, it can continue to build, the backend code won't be
> removed during this phase.
>
>
> 2017-11-28 22:58 GMT+08:00 Billy Liu <billy...@apache.org>:
>
>> +1 to turn this feature off by default. The advanced user could enable it
>> also.
>>
>> 2017-11-27 13:53 GMT+08:00 Luke Han <luke...@gmail.com>:
>>
>>> +1 to remove it from new release, people could backport to new version
>>> with previous code
>>>
>>>
>>> Best Regards!
>>> -
>>>
>>> Luke Han
>>>
>>> On Sun, Nov 26, 2017 at 9:40 PM, ShaoFeng Shi <shaofeng...@apache.org>
>>> wrote:
>>>
>>>> Last year I raised this discussion but didn't have follow-up action.
>>>>
>>>> Now we see there are still some new users misuse this feature, and then
>>>> face performance and maintenance issues.
>>>>
>>>> In Kylin 2.1, the new "query pushdown" feature can forward the Cube
>>>> unmatched queries to alternative query engines like Hive / SparkSQL. The
>>>> raw data query is just such a scenario.
>>>>
>>>> So I think it is time to disable the RAW measure on Kylin now.  JIRA
>>>> created for it: https://issues.apache.org/jira/browse/KYLIN-3062
>>>>
>>>> Please comment if you see any issue.
>>>>
>>>> 2016-12-19 22:03 GMT+08:00 Billy Liu <billy...@apache.org>:
>>>>
>>>> > The experimental mode is system wide feature toggle. I think case by
>>>> case
>>>> > is more flexible. Most new features could have toggles, default are
>>>> off.
>>>> >
>>>> > 2016-12-19 21:40 GMT+08:00 Luke Han <luke...@gmail.com>:
>>>> >
>>>> > > Beta or Experimental will also bring confusing for most of users.
>>>> > >
>>>> > > Maybe we could have something called "expert" or "experimental"
>>>> model in
>>>> > > system configuration.
>>>> > >
>>>> > > User will not see such content since they will be hidden by default
>>>> but
>>>> > > admin could set it to true if they confident to enable such
>>>> features.
>>>> > >
>>>> > > How do you think?
>>>> > >
>>>> > >
>>>> > > Best Regards!
>>>> > > -
>>>> > >
>>>> > > Luke Han
>>>> > >
>>>> > > On Mon, Dec 19, 2016 at 11:24 AM, Xiaoyu Wang <wangxy...@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > > I’m sorry for too long not maintain it!
>>>> > > >
>>>> > > > I agree with liyang to give a label to the RAW measure “Beta” or
>>>> > others.
>>>> > > >
>>>> > > > I will improve it when I have time!
>>>> > > >
>>>> > > > 2016-12-19 10:21 GMT+08:00 Li Yang <liy...@apache.org>:
>>>> > > >
>>>> > > > > Or display the Raw with "Beta" or "Experimental" to warn user
>>>> that it
>>>> > > is
>>>> > > > > not a mature feature?
>>>> > > > >
>>>> > > > > On Fri, Dec 16, 2016 at 12:30 PM, 康凯森 <kangkai...@qq.com>
>>>> wrote:
>>>> > > > >
>>>> > > > > > +1.
>>>> > > > > > But the "RAW" measure is still some useful, we could improve
>>>> it
>>>> > next
>>>> > > > year
>>>> > > > > > when we have time.
>>>> > > > > >
>>>> > > > > >
>>>> > > > > > -- 原始邮件 --
>>>> > > > > > 发件人: "ShaoFeng Shi";<shaofeng...@apache.org>;
>>>> > > > > > 发送时间: 2016年12月15日(星期四) 中午12:05
>>>> > > > > > 收件人: "dev"<d...@kylin.apache.org>;
>>>> > > > > >
>>>> > > > &

Re: availableVirtualCores

2017-11-27 Thread Li Yang
Where do you see -- Cluster.info: 'availableVirtualCores=3'??

Cannot recognize it.

On Sat, Nov 25, 2017 at 4:29 AM, Alberto Ramón 
wrote:

> Hello
>
> From Ambari, the number of virtual cores is 4:
> [image: Inline images 1]
>
> But in the file Cluster.info: 'availableVirtualCores=3'
>
> (RAM is correct)
>
> I don't know from where Kylin read this config
>


Re: Shall we define the base cuboid?

2017-11-22 Thread Li Yang
Don't think so. With only AGG2, the combination of [cal_dt, city, buyer_id]
is not computed at all.

On Mon, Nov 20, 2017 at 1:59 PM, 杨浩  wrote:

> So, shall we change the document to delete the AGG1 , and says it is
> computed by default?
>
> 2017-11-20 8:55 GMT+08:00 ShaoFeng Shi :
>
>> Yes, the base cuboid is the cuboid of all dimensions on rowkey. It is the
>> parent of all AGG.
>>
>> 2017-11-19 16:06 GMT+08:00 杨浩 :
>>
>>> As in the article http://kylin.apache.org/blog/2016/02/18/new-aggregat
>>> ion-group/ , two aggs are defined in the last. The AGG1 is base cuboid,
>>> shall we define it explicity? It seems the base cuboid is computed implicit
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


Re: Has only base cuboid for some cube desc

2017-11-21 Thread Li Yang
Looks like a bug to me. At least worth investigation. Could you open a JIRA?

On Thu, Nov 9, 2017 at 2:44 PM,  wrote:

> Maybe the same as the closed issue https://issues.apache.
> org/jira/browse/KYLIN-2197
>
>
>
>
> in the image:
>
> bottom-left: row-keys, can see all the dimensions.
> top-left: I have create two agg-group with dimensions as mandatory
> dimension.
> bottom-right: the result coboid has only one base cuboid.
>
> is it a bug, or i config it in a wrong way?
>
> my version is kylin 2.1
>


Re: Kylin 2.1.0 & 2.2.0 failed at build cube with spark in yarn mode.

2017-11-19 Thread Li Yang
Need more info on the exact error. Like logs, stacktrace etc.

On Thu, Nov 16, 2017 at 12:02 PM, prasanna lakshmi <
prasannapadar...@gmail.com> wrote:

> Hi all,
>
> I followed the document  kylin 2.1.0 and above is provided for
> installation and development. I followed all instructions in that document.
> My kylin build failed at 7th step which is build cube with spark step. Can
> you please suggest me what is my mistake.
>


Re: I can't use FK as measure.

2017-11-19 Thread Li Yang
You should be able to. Make sure the FK is listed as a dimension or a
measure in the model. Please give more info to help troubleshoot if you
still have problem.

On Thu, Nov 16, 2017 at 9:46 AM,  wrote:

> When create model i can't use FK as measure. is it a bug.


Re: New document: Install Kylin on AWS EMR

2017-11-15 Thread Li Yang
Cool~~

On Tue, Nov 14, 2017 at 9:18 AM, ShaoFeng Shi 
wrote:

> Hi Kylin users,
>
> Add a document on how to install Kylin on AWS EMR, and how to configure S3
> as the storage:
>
> https://kylin.apache.org/docs21/install/kylin_aws_emr.html
>
> It is also very easy to compose an EMR bootstrap script to automate this.
> If you see any problem, feel free to let me know.
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: [Announce] Apache Kylin 2.2.0 released

2017-11-13 Thread Li Yang
Dong bravo~~  :-)

On Sat, Nov 4, 2017 at 6:08 PM, Roberto Tardío 
wrote:

> Congratulations! You are doing a great job.
>
> El 04/11/2017 a las 3:26, Dong Li escribió:
>
> The Apache Kylin team is pleased to announce the immediate availability
> of the 2.2.0 release.
>
> This is a major release after 2.1, with more than 70 bug fixes and
> enhancements; All of the changes in this release can be found in:
> *https://kylin.apache.org/docs21/release_notes.html
> *
>
> You can download the source release and binary packages from Apache
> Kylin's download page: https://kylin.apache.org/download/
>
> Apache Kylin is an open source Distributed Analytics Engine designed to
> provide SQL interface and multi-dimensional analysis (OLAP) on Apache
> Hadoop, supporting extremely large datasets.
>
> Apache Kylin lets you query massive data set at sub-second latency in 3
> steps:
> 1. Identify a star schema or snowflake schema data set on Hadoop.
> 2. Build Cube on Hadoop.
> 3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
> or RESTful API.
>
> Thanks everyone who have contributed to the 2.2.0 release.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kylin.apache.org/
>
> Thanks,
> Dong Li
>
>
> --
>
> *Roberto Tardío Olmos*
> *Senior Big Data & Business Intelligence Consultant*
> Avenida de Brasil, 17
> ,
> Planta 16.28020 Madrid
> Fijo: 91.788.34.10
>


Re: How to debug kylin remotely and Setup Development Env?

2017-11-13 Thread Li Yang
Have you looked http://kylin.apache.org/development/dev_env.html


On Mon, Nov 13, 2017 at 11:19 PM, 天涯 <401223...@qq.com> wrote:

> hi all ,
> i have a two questions
>
> 1: i want to setup Development Env for kylin in windows7 , can you help me
> give step by step to do setup Development Env or docs in my local, hadoop
> Env is cdh5.12
>
> 2: debug kylin source code , but I don't know how to start, What should I
> do? Have any documents?
>
>


Re: Seperate ZooKeeper nodes when deploy StandAlone Hbase cluster

2017-11-09 Thread Li Yang
Sorry for the late reply. Was very occupied recently.

> Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
different from the main cluster?
> Is this imply that the main cluster & the Hbase cluster should share the
same ZK node?
Looked again. My previous answer confused you. Sorry for that. I thought
you were asking about using 2 HBase clusters, but actually the question was
about r/w separation deployment.

Yes, Kylin can work with 2 clusters. One called read cluster which hosts
HBase and provides query horsepower. Another called write cluster (the main
cluster in the question) which is responsible for cube building. Kylin uses
the Zookeeper of the HBase cluster for its job coordination by default.

When building cube, the write cluster (or main cluster) will write to the
HBase cluster, to create the HBase table and bulk load data. The
kylin.env.hdfs-working-dir should be on the write cluster by design.

In the step "Create HTable", Kylin wrote a partition file based on which a
new HTable is created. That must be the write operation you observed.

Cheers
Yang


On Mon, Oct 30, 2017 at 8:23 PM, Yuxiang Mai <yuxiang@gmail.com> wrote:

> Hi, Li Yang
>
> Thanks for your reply.
>
> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
> different from the main cluster?
> No. Kylin only works with 1 HBase and its related Zookeeper.
>
> Is this imply that the main cluster & the Hbase cluster should share the
> same ZK node?
>
> And I have one more question. about the kylin.env.hdfs-working-dir, is
> the HDFS working should placed on main cluster or the Hbase cluster?
>
> Because during building a cube, after Extract Fact Table Distinct Columns
> &  Save Cuboid Statistics, In the step " Create HTable", It means stuck
> and no response for a long time;
> In kylin.log, it seems stuck in this job:
>
> 2017-10-30 20:16:46,730 INFO  [Job e82dca5a-93c6-47ca-a707-674372708b5f-193]
> common.HadoopShellExecutable:59 :  -cubename 123 -segmentid
> 6223ddc9-ac80-4a10-b3c8-33165fe8be4c -partitions hdfs://maincluster/
> kylinworkingdir/kylin_metadata/kylin-e82dca5a-93c6-
> 47ca-a707-674372708b5f/123/rowkey_stats/part-r-0 -statisticsenabled
> true
>
>  In this step, it seems generating hbase table in the HDFS working dir.
> Does it mean the HDFS working dir is on Hbase cluster, not main cluster?
>
> Thanks a lot
>
> Yuxiang MAI
>
>
>
> On Sun, Oct 29, 2017 at 6:41 PM, Li Yang <liy...@apache.org> wrote:
>
>> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
>> different from the main cluster?
>> No. Kylin only works with 1 HBase and its related Zookeeper.
>>
>> > How Kylin get yarn config when submmiting job?
>> Kylin took Hadoop config from classpath. And the most classpath comes
>> from HBase shell.
>>
>> On Wed, Oct 25, 2017 at 4:33 PM, Yuxiang Mai <yuxiang@gmail.com>
>> wrote:
>>
>>> Hi, experts
>>>
>>> We are now deploying standalone Hbase out of the hadoop cluster to
>>> improve the query performance.
>>> http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>>>
>>> The new Hbase cluster use seperate zookeeper nodes from the main
>>> cluster. Kylin server can access both the Hbase, hadoop & hive resource.
>>> But in this configuration, cude build failed in the first step:
>>>
>>> There are 3 hive commands in the first step:
>>> DROP TABLE IF EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc;
>>>
>>> CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc
>>> ...
>>> INSERT OVERWRITE TABLE kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc SELECT
>>> ..
>>>
>>>
>>> drop & create table are OK, but failed on "insert overwrite" with the
>>> following exception.
>>>
>>>
>>> FAILED: IllegalArgumentException java.net.UnknownHostException:
>>> maincluster
>>>
>>> at org.apache.kylin.common.util.CliCommandExecutor.execute(CliC
>>> ommandExecutor.java:92)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.createF
>>> latHiveTable(CreateFlatHiveTableStep.java:52)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(
>>> CreateFlatHiveTableStep.java:70)
>>> at org.apache.kylin.job.execution.AbstractExecutable.execute(Ab
>>> stractExecutable.java:124)
>>> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWo
>>> rk(DefaultChainedEx

Re: 查询后台报错,dict.TrieDictionary:182 : Not a valid value

2017-10-29 Thread Li Yang
The log does not necessarily mean a real error or problem from user point
of view. Is the query result correct?

2017-10-27 13:40 GMT+08:00 chenping...@keruyun.com 
:

> kylin1.6在平台cdh5.8.4上,查询一个Cube后台报如下错误
> 2017-10-27 13:32:08,147 WARN  [Query fbdd25b3-e667-4af1-
> bb8b-bbcea25fdb05-71] cube.RawQueryLastHacker:76 : SUM
> is not defined for measure column sd.tessdf.RANK_PEOPLE_
> COUNT, output will be meaningless.
> 2017-10-27 13:32:08,147 WARN  [Query fbdd25b3-e667-4af1-
> bb8b-bbcea25fdb05-71] cube.RawQueryLastHacker:76 : SUM
> is not defined for measure column sd.tessdf.AVG_USEFUL_
> AMOUNT, output will be meaningless.
> 2017-10-27 13:32:08,998 ERROR [Query f3118513-5a59-4591-
> 88e7-e35da6a3b4e7-71] dict.TrieDictionary:182 : Not a valid value:
> 000810005460
> 2017-10-27 13:32:08,998 ERROR [Query f3118513-5a59-4591-
> 88e7-e35da6a3b4e7-71] dict.TrieDictionary:182 : Not a valid value:
> 000810006841
> 请问大家遇到过这个问题没? 我初步推测应该是和具体的Cube有很大关系。
>
>
>
> --
>
> 陈平  DBA工程师
>
>
>
> 成都时时客科技有限责任公司
>
> 地址:成都市高新区天府大道1268号1栋3层
>
> 邮编:610041
>
> 手机:15108456581 <(510)%20845-6581>
>
> 在线:QQ 625852056
>
> 官网:www.keruyun.com
>
> 客服:4006-315-666
>
>
>


Re: Seperate ZooKeeper nodes when deploy StandAlone Hbase cluster

2017-10-29 Thread Li Yang
> Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
different from the main cluster?
No. Kylin only works with 1 HBase and its related Zookeeper.

> How Kylin get yarn config when submmiting job?
Kylin took Hadoop config from classpath. And the most classpath comes from
HBase shell.

On Wed, Oct 25, 2017 at 4:33 PM, Yuxiang Mai  wrote:

> Hi, experts
>
> We are now deploying standalone Hbase out of the hadoop cluster to improve
> the query performance.
> http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>
> The new Hbase cluster use seperate zookeeper nodes from the main cluster.
> Kylin server can access both the Hbase, hadoop & hive resource.
> But in this configuration, cude build failed in the first step:
>
> There are 3 hive commands in the first step:
> DROP TABLE IF EXISTS kylin_intermediate_test1_ba3c5910_ff7d_4669_b28a_
> 4ec2736d60dc;
>
> CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_test1_
> ba3c5910_ff7d_4669_b28a_4ec2736d60dc
> ...
> INSERT OVERWRITE TABLE 
> kylin_intermediate_test1_ba3c5910_ff7d_4669_b28a_4ec2736d60dc
> SELECT
> ..
>
>
> drop & create table are OK, but failed on "insert overwrite" with the
> following exception.
>
>
> FAILED: IllegalArgumentException java.net.UnknownHostException: maincluster
>
> at org.apache.kylin.common.util.CliCommandExecutor.execute(
> CliCommandExecutor.java:92)
> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.
> createFlatHiveTable(CreateFlatHiveTableStep.java:52)
> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(
> CreateFlatHiveTableStep.java:70)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:64)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:142)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> It seems MR job are failed to submit to YARN. In our debug, seems job is
> not submitted to main cluster.
> So my question is:
> 1. Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
> different from the main cluster?
> 2. How Kylin get yarn config when submmiting job? I can only find hive &
> hbase config, but not yarn related config.
>
>
> Thanks a lot.
>
> --
> Yuxiang Mai
>
>


Re: [Announce] New Apache Kylin PMC Billy Liu

2017-10-21 Thread Li Yang
Welcome~~

On Thu, Oct 19, 2017 at 4:17 PM, 杨浩  wrote:

> Congratuolations to  Bill, Guosheng and Cheng Wang  !!
>
> 2017-10-16 20:41 GMT+08:00 Alberto Ramón :
>
>> Congratuolations to  Bill, Guosheng and Cheng Wang  !!
>>
>> On 16 October 2017 at 11:33, Luke Han  wrote:
>>
>>> On behalf of the Apache Kylin PMC, I am very pleased to announce
>>> that Billy Liu has accepted the PMC's invitation to become a
>>> PMC member on the project.
>>>
>>> We appreciate all of Billy's generous contributions about many bug
>>> fixes, patches, helped many users. We are so glad to have him to be
>>> our new PMC and looking forward to his continued involvement.
>>>
>>> Congratulations and Welcome, Billy!
>>>
>>
>>
>


Re: Start Error

2017-10-21 Thread Li Yang
HBase version mismatch.

On Wed, Oct 18, 2017 at 5:13 PM, yuyong.zhai  wrote:

> hbase version?
>  原始邮件
> *发件人:* 845286...@qq.com<845286...@qq.com>
> *收件人:* user
> *发送时间:* 2017年10月18日(周三) 16:44
> *主题:* Start Error
>
> when start kylin encounter this error, how to solve??
>
> 2017-10-18 16:41:40,832 INFO  [main] Configuration.
> deprecation:1176 : hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> Exception in thread "main" java.lang.IllegalArgumentException:
> Failed to find metadata store by url: kylin_metadata@hbase
> at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(
> ResourceStore.java:89)
> at org.apache.kylin.common.persistence.ResourceStore.
> getStore(ResourceStore.java:101)
> at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(
> AclTableMigrationTool.java:94)
> at org.apache.kylin.tool.AclTableMigrationCLI.main(
> AclTableMigrationCLI.java:41)
> Caused by: java.lang.NoSuchMethodError: org.apache.
> hadoop.hbase.client.Get.setCheckExistenceOnly(Z)Lorg/
> apache/hadoop/hbase/client/Get;
> at org.apache.kylin.storage.hbase.HBaseResourceStore.
> internalGetFromHTable(HBaseResourceStore.java:377)
> at org.apache.kylin.storage.hbase.HBaseResourceStore.getFromHTable(
> HBaseResourceStore.java:363)
> at org.apache.kylin.storage.hbase.HBaseResourceStore.
> existsImpl(HBaseResourceStore.java:116)
> at org.apache.kylin.common.persistence.ResourceStore.
> exists(ResourceStore.java:144)
> at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(
> ResourceStore.java:84)
> ... 3 more
>
> --
> 845286...@qq.com
>


Re: [外部邮件] Map input splits are 0 bytes, something is wrong!

2017-10-20 Thread Li Yang
Even for empty table, there is still a file (of very small size) exists. So
a exactly zero bytes input is retreated as a system error.

On Tue, Oct 17, 2017 at 2:08 PM, 曾耀武  wrote:

> 你的表输入数据为空,检查一下你需要计算的原始数据分区是否有数据。
>
> 发件人: "li...@fcyun.com" 
> 答复: user 
> 日期: 2017年10月17日 星期二 下午2:06
> 至: user 
> 主题: [外部邮件] Map input splits are 0 bytes, something is wrong!
>
> hi all,
>
> Have you ever see the error like this? how to fix it?
> what are the possible problems to check out?
>
>
>
> #7 Step Name: Build Base Cuboid
> Duration: 0.01 mins Waiting: 0 seconds
>
> java.lang.IllegalArgumentException: Map input splits are 0 bytes, something 
> is wrong!
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.getTotalMapInputMB(AbstractHadoopJob.java:573)
>   at org.apache.kylin.engine.mr.steps.CuboidJob.run(CuboidJob.java:134)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:102)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:123)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:142)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> result code:2
>
>
> thanks a lot ~
>
> --
> li...@fcyun.com
>


Re: [外部邮件] Re: FW: 建造cube 时报错,20个维度测试

2017-10-20 Thread Li Yang
It's Hadoop client configuration issue, not Kylin problem.

2017-10-16 21:37 GMT+08:00 曾耀武 :

> 已经解决了,在hadoop 客户端core-site.xml里将hfs://nameservice
> 改成hdfs://nameservice:8020正常执行
>
> 发件人: Billy Liu 
> 答复: user 
> 日期: 2017年10月16日 星期一 下午8:09
> 至: user 
> 主题: [外部邮件] Re: FW: 建造cube 时报错,20个维度测试
>
> What's the current working-dir in kylin.properties?
>
> 在 2017年10月16日 下午6:55,曾耀武 写道:
>
>> 没编辑完整,抱歉, 代码在HbaseResourceStore里面做了判断 ,
>> 然后代码就走向检查了文件系统,报了上述错误。谁知道这是什么原因?该做什么配置么?
>>
>> 发件人: MOMO 
>> 日期: 2017年10月16日 星期一 下午6:50
>> 至: user 
>> 主题: 建造cube 时报错,20个维度测试
>>
>> 请教个问题,在kylin2.1版本建造一个具有20个维度的cube 的时候,报了如下错误,在建造数量较少维度的时候,系统正常执行。看代码是在
>>
>> java.lang.IllegalArgumentException: Wrong FS: 
>> hdfs://nameservice1/kylin/kylin-kylin_metadata/resources/cube_statistics/molive_pay_order_cube/9843c68a-d362-43fc-9c12-1b6bb9618dd9.seq,
>>  expected: hdfs://nameservice1:8020
>>  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>>  at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
>>  at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
>>  at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
>>  at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
>>  at 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>  at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
>>  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
>>  at 
>> org.apache.kylin.storage.hbase.HBaseResourceStore.writeLargeCellToHdfs(HBaseResourceStore.java:393)
>>  at 
>> org.apache.kylin.storage.hbase.HBaseResourceStore.buildPut(HBaseResourceStore.java:418)
>>  at 
>> org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:293)
>>  at 
>> org.apache.kylin.common.persistence.ResourceStore.putResourceCheckpoint(ResourceStore.java:251)
>>  at 
>> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:246)
>>  at 
>> org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:73)
>>  at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
>>  at 
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:65)
>>  at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
>>  at 
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:141)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>>
>


Re: query return error result.

2017-10-08 Thread Li Yang
Interesting... is it HLL count distinct or bitmap count distinct?

On Wed, Sep 27, 2017 at 11:19 AM, yu feng  wrote:

> I add some log and find data from hbase is incorrect.
>
> 2017-09-27 11:17 GMT+08:00 yu feng :
>
>> I have a cube like this :
>> dimensions : source_type, source_id, name, dt
>> measures:count(distinct uid), count(1) , count(distinct buyer)
>>
>> I run the query :
>>
>> select source_type, source_id, name,
>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>> count(buyer) as bnum
>> from
>> table_name
>> where
>> dt between '2017-06-01' and '2017-09-18'
>> and source_id is not null
>> and source_type is not null
>> group by
>> source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> return :
>>
>> mv
>> 423031
>> 起点‧终站
>> 193794
>> 92
>> 42043
>> 92
>>
>>
>>
>>
>>
>> obviously, it is error result, I query the sourceid like this:
>>
>> select source_type, source_id, name,
>> count(distinct uid), count(uid) as cnum, count(distinct buyer) as
>> buyerNum,
>> count(buyer) as bnum
>> from
>> vip_buying_funnel_cube_view
>> where
>> dt between '2017-06-01' and '2017-09-18'
>> and source_id is not null
>> and source_type is not null
>> and source_id = '423031'
>> group by
>> source_type, source_id, name
>> order by buyerNum desc limit 1 offset 0
>>
>> the result is corrent :
>>
>> mv
>> 423031
>> 起点‧终站
>> 77
>> 92
>> 11
>> 92
>>
>
>


Re: Re: how to avoid the combination of this situation: " Apple company have a Nike store"?

2017-10-07 Thread Li Yang
> Kylin only support the Star model recently..

This is not correct. Kylin supports snowflake schema start from v2.0. You
are to setup all kinds of joins, including unlimited levels of lookup
tables.

On Sun, Sep 24, 2017 at 10:58 PM, li...@fcyun.com <li...@fcyun.com> wrote:

> thanks for your reply.
>
> Yes, the compay_id in deal and store table are the same.  The buyer in the
> deal_history table I didn't include in this relationship. That's my fault.
> sorry for confusing.
>
> store_id and company_id are primary keys in tables respectively, t_store
> have a FK relationship with t_company throught company_id.
>
> I wondered is it possibe exist wrong  a cubeid through this relationship
> when the cube creation.
>
> Kylin only support the Star model recently, maybe My thoughts is
> unnecessary, the wrong cubeid of combiantion won't happen.
>
> Best Wishes
>
> Liu Bo
>
> --
> li...@fcyun.com
>
>
> *From:* Li Yang <liy...@apache.org>
> *Date:* 2017-09-23 09:39
> *To:* user <user@kylin.apache.org>; 崔苗 <cuim...@danale.com>
> *Subject:* Re: how to avoid the combination of this situation: " Apple
> company have a Nike store"?
> OK. I saw two companies fields. One is on the deal table, the other is on
> the store table. Are they the same? Usually a deal has a buyer and a
> seller. Could they mean buyer company and seller company?
>
> On Thu, Sep 21, 2017 at 12:05 PM, 崔苗 <cuim...@danale.com> wrote:
>
>> 如果一个store_id只属于一个company_id,你可以断开t_deal_history与t_company的连接
>> ,通过store_id可以间接对应到company_id和company_name,这样模型比较简单,也不会出现错误组合。
>>
>> --
>> 发件人:li...@fcyun.com <li...@fcyun.com>
>> 发送时间:2017年9月21日(星期四) 11:54
>> 收件人:user <user@kylin.apache.org>
>> 主 题:how to avoid the combination of this situation: " Apple company have
>> a Nike store"?
>>
>> hi, Thanks for your reading this first!
>>
>>
>> 1.here's my tables:
>>
>> t_deal_history is the fact table that  recording  every bill of a store.
>> t_store is the lookup table
>> t_company is the lookup  table, one company can have many stores in
>> different place.
>>
>>
>>
>> sample data for the tables:
>> t_compay:
>> company_id company_name
>> 1 NIKE
>> 2 Apple
>>
>> t_store:
>>
>> store_id store_name company_id
>> 1 Nike Flagship store 1
>> 2 Nike Shoes store 1
>> 3 Apple NewYork 2
>> 4 Apple Tokyo 2
>>
>>
>> t_deal_history:
>>
>> deal_history_id store_id company_id bill_money
>> 1 1 1 100.00
>> 2 2 1 100.00
>> 3 3 2 100.00
>> 4 4 2 100.00
>>
>> kylin model design:
>>
>>
>> Did this can occur a cubeid which " Apple company have a Nike store"?
>> how to avoid the combination of this situation?
>> Or, change t_deal_history don't join with t_store instead of join with
>> t_company directly,  is it possible?
>>
>>
>> What's the right relationship between these three tables? any ideas?
>>
>>
>>
>> Thanks a lot.
>>
>> --
>> li...@fcyun.com
>>
>>
>


Re: kylin2.1 mapreduce appmaster log4j error

2017-10-07 Thread Li Yang
Thanks for sharing the solution, Yuyong!

On Sun, Sep 24, 2017 at 7:52 PM, yuyong.zhai <yuyong.z...@ele.me> wrote:

> solved the problem。
>
> kylin use hbase/lib as its classpath before hadoop/lib, the
> hadoop-client-common version different
>
>
> hadoop: 2.6.0-cdh5.8.2
>
> hbase: 1.2.6
>
>
>
>
>
>  原始邮件
> *发件人:* Li Yang<liy...@apache.org>
> *收件人:* user<user@kylin.apache.org>
> *发送时间:* 2017年9月23日(周六) 09:07
> *主题:* Re: kylin2.1 mapreduce appmaster log4j error
>
> looks like env problem, which hadoop release you are using?
>
> On Tue, Sep 19, 2017 at 7:24 PM, yuyong.zhai <yuyong.z...@ele.me> wrote:
>
>> log4j:ERROR setFile(null,true) call failed.
>> java.io.FileNotFoundException: 
>> /data5/nmlog/application_1504936740013_969498/container_e09_1504936740013_969498_01_01
>>  (Is a directory)
>>  at java.io.FileOutputStream.open(Native Method)
>>  at java.io.FileOutputStream.(FileOutputStream.java:221)
>>  at java.io.FileOutputStream.(FileOutputStream.java:142)
>>  at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
>>  at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
>>  at 
>> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
>>  at 
>> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>>  at 
>> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>>  at 
>> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>>  at 
>> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
>>  at 
>> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
>>  at 
>> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
>>  at 
>> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
>>  at 
>> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
>>  at 
>> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>  at org.apache.log4j.LogManager.(LogManager.java:127)
>>  at org.apache.log4j.Logger.getLogger(Logger.java:104)
>>  at 
>> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
>>  at 
>> org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:108)
>>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>  at 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>  at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>  at 
>> org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025)
>>  at 
>> org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844)
>>  at 
>> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541)
>>  at 
>> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)
>>  at 
>> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)
>>  at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
>>  at 
>> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
>> Sep 19, 2017 7:04:13 PM 
>> com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
>>  get
>> WARNING: You are attempting to use a deprecated API (specifically, 
>> attempting to @Inject ServletContext inside an eagerly created singleton. 
>> While we allow this for backwards compatibility, be warned that this MAY 
>> have unexpected behavior if you have more than one injector (with 
>> ServletModule) running in the same JVM. Please consult the Guice 
>> documentation at http://code.google.com/p/google-guice/wiki/Servlets for 
>> more information.
>> Sep 19, 2017 7:04:13 PM 
>> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
>> INFO: Registering 
>> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider 
>> class
>> Sep 19, 2017 7:04:13 PM 
>> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
>> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
>&g

Re: how to avoid the combination of this situation: " Apple company have a Nike store"?

2017-09-22 Thread Li Yang
OK. I saw two companies fields. One is on the deal table, the other is on
the store table. Are they the same? Usually a deal has a buyer and a
seller. Could they mean buyer company and seller company?

On Thu, Sep 21, 2017 at 12:05 PM, 崔苗  wrote:

> 如果一个store_id只属于一个company_id,你可以断开t_deal_history与t_company的连接,通过store_
> id可以间接对应到company_id和company_name,这样模型比较简单,也不会出现错误组合。
>
> --
> 发件人:li...@fcyun.com 
> 发送时间:2017年9月21日(星期四) 11:54
> 收件人:user 
> 主 题:how to avoid the combination of this situation: " Apple company have a
> Nike store"?
>
> hi, Thanks for your reading this first!
>
>
> 1.here's my tables:
>
> t_deal_history is the fact table that  recording  every bill of a store.
> t_store is the lookup table
> t_company is the lookup  table, one company can have many stores in
> different place.
>
>
>
> sample data for the tables:
> t_compay:
> company_id company_name
> 1 NIKE
> 2 Apple
>
> t_store:
>
> store_id store_name company_id
> 1 Nike Flagship store 1
> 2 Nike Shoes store 1
> 3 Apple NewYork 2
> 4 Apple Tokyo 2
>
>
> t_deal_history:
>
> deal_history_id store_id company_id bill_money
> 1 1 1 100.00
> 2 2 1 100.00
> 3 3 2 100.00
> 4 4 2 100.00
>
> kylin model design:
>
>
> Did this can occur a cubeid which " Apple company have a Nike store"?
> how to avoid the combination of this situation?
> Or, change t_deal_history don't join with t_store instead of join with
> t_company directly,  is it possible?
>
>
> What's the right relationship between these three tables? any ideas?
>
>
>
> Thanks a lot.
>
> --
> li...@fcyun.com
>
>


InsertPic_DBD5.png
Description: Binary data


InsertPic_50DA.png
Description: Binary data


InsertPic_.png
Description: Binary data


InsertPic_961F.png
Description: Binary data


Re: Global dictionary dose not support for column that serves as both measure and dimension

2017-09-22 Thread Li Yang
This is a known issue. https://issues.apache.org/jira/browse/KYLIN-2679
Thanks for reporting again. This will help the priority of the task.

On Wed, Sep 20, 2017 at 4:42 PM, bubugao0809  wrote:

> Hi, all
>
> I found the following content from URL: http://kylin.apache.org/blog/
> 2016/08/01/count-distinct-in-kylin/
>
> "The global dictionay cannot be used for dimension encoding for now, that
> means if one column is used for both dimension and count distinct measure
> in one cube, its dimension encoding should be others instead of dict."
>
> It means that the column, serving as both measure and dimension, can also
> use the 'Global dictionary'.
>
> For my case, I use DIM_DEVELOPER.DEVELOPER_ID for count distinct in
> measure, as well as the dimensions. And I change the encoding in Rowkeys
> (listed in the 4th row) from dict to integer with the lenght of 8.
>
> However, it still does not work, showing the following error messge:  'ERROR
> : Global dictionary couldn't be used for dimension column:
> DIM_DUI.DIM_DEVELOPER.DEVELOPER_ID'
>
> I was wondering if there is any way that I could work around, because the
> situation tjhat column serves as both measure and dimension is quite common
> in productive env
>
>
>
>


Re: kylin2.1 mapreduce appmaster log4j error

2017-09-22 Thread Li Yang
looks like env problem, which hadoop release you are using?

On Tue, Sep 19, 2017 at 7:24 PM, yuyong.zhai  wrote:

> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /data5/nmlog/application_1504936740013_969498/container_e09_1504936740013_969498_01_01
>  (Is a directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:221)
>   at java.io.FileOutputStream.(FileOutputStream.java:142)
>   at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
>   at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
>   at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
>   at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>   at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>   at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>   at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
>   at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
>   at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
>   at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
>   at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
>   at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>   at org.apache.log4j.LogManager.(LogManager.java:127)
>   at org.apache.log4j.Logger.getLogger(Logger.java:104)
>   at 
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
>   at 
> org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:108)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)
>   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
>   at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Sep 19, 2017 7:04:13 PM 
> com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
>  get
> WARNING: You are attempting to use a deprecated API (specifically, attempting 
> to @Inject ServletContext inside an eagerly created singleton. While we allow 
> this for backwards compatibility, be warned that this MAY have unexpected 
> behavior if you have more than one injector (with ServletModule) running in 
> the same JVM. Please consult the Guice documentation at 
> http://code.google.com/p/google-guice/wiki/Servlets for more information.
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering 
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider 
> class
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
> provider class
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as 
> a root resource class
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver 
> to GuiceManagedComponentProvider with the scope "Singleton"
> Sep 19, 2017 7:04:13 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
> GuiceManagedComponentProvider with the scope "Singleton"
> Sep 19, 2017 7:04:14 PM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding 

Re: Kylin acl - ldap

2017-09-22 Thread Li Yang
The JIRA is good. Thanks Sonny!

On Tue, Sep 19, 2017 at 8:46 AM, Sonny Heer <sonnyh...@gmail.com> wrote:

> Here is the JIRA: https://issues.apache.org/jira/browse/KYLIN-2878
>
> Let me know if more info is needed.  The basic idea is non-admin ldap
> group / role with full permissions to projectA should allow user in that
> group to edit model and sync tables within projectA.
>
> Thanks!
>
> On Sun, Sep 17, 2017 at 12:36 AM, Li Yang <liy...@apache.org> wrote:
>
>> This is good proposal. Could a JIRA be created?
>>
>> A little history. Before KYLIN-2717
>> <https://issues.apache.org/jira/browse/KYLIN-2717>, a table is global
>> and is shared by all projects. Only system admin can sync Hive table, as it
>> will have system wide impact.
>>
>> Once KYLIN-2717 <https://issues.apache.org/jira/browse/KYLIN-2717> is
>> done, tables are isolated by project, we will be ready to grant table
>> permissions to project level admin.
>>
>> On Sun, Sep 17, 2017 at 6:23 AM, Sonny Heer <sonnyh...@gmail.com> wrote:
>>
>>> Kylin versions is 1.6
>>>
>>> Is there a way to give full access to a project?  Currently we are able
>>> to give access to a project via ROLE in ldap, but that doesn't allow user
>>> to sync/load hive tables (the blue buttons are missing).  Also unable to
>>> edit model.  In order to give that permission we have to edit group to add
>>> kylin-admins, but then user has full access to all projects.
>>>
>>> Question:
>>>
>>> when only allowing a custom ROLE access to projectA - shouldn't the user
>>> be able to load tables/ edit models?
>>>
>>> Thanks
>>>
>>
>>
>


Re: Kylin acl - ldap

2017-09-17 Thread Li Yang
This is good proposal. Could a JIRA be created?

A little history. Before KYLIN-2717
, a table is global and
is shared by all projects. Only system admin can sync Hive table, as it
will have system wide impact.

Once KYLIN-2717  is done,
tables are isolated by project, we will be ready to grant table permissions
to project level admin.

On Sun, Sep 17, 2017 at 6:23 AM, Sonny Heer  wrote:

> Kylin versions is 1.6
>
> Is there a way to give full access to a project?  Currently we are able to
> give access to a project via ROLE in ldap, but that doesn't allow user to
> sync/load hive tables (the blue buttons are missing).  Also unable to edit
> model.  In order to give that permission we have to edit group to add
> kylin-admins, but then user has full access to all projects.
>
> Question:
>
> when only allowing a custom ROLE access to projectA - shouldn't the user
> be able to load tables/ edit models?
>
> Thanks
>


Re: Does Kylin support count(column_name)

2017-09-17 Thread Li Yang
You don't need to define "count(column)" usually, because Kylin will
replace all "count(column)" with "count(1)" automatically.


On Fri, Sep 15, 2017 at 10:11 AM, zhangchao  wrote:

> Hi, all
> Does Kylin support defining measure with count(column_name), now I can
> only find count(1), since I want to let Saiku connect to Kylin, but
> Mondrian seems does not support count(1).
> If Kylin does not support count(column_name), whether
> count_distinct(column_name) is a good way, Thanks in advance for any help.
>
> --
> 张超
> 深圳市九指天下科技有限公司 技术部
> 手机:13692152652
> 邮箱:zhangc...@9zhitx.com
>


Re: Query across different MODEL/CUBES

2017-09-10 Thread Li Yang
There is no way to specify model/cube now. But as a workaround, you can put
different model in different project and select project.

On Thu, Sep 7, 2017 at 9:46 AM, Yuxiang Mai  wrote:

> Hi, Billy
> HINT? ECan you help to more specify?
>
> And seems it should be 1 fact table for 1 model? Although it's not
> mandatory. Because Kylin is a Start model for OLAP, we ask modeler not to
> overlap fact table across models. Is this true & recommended?
>
> Thanks for your reply.
>
> BR//MYX
>
> On Thu, Sep 7, 2017 at 9:27 AM, Billy Liu  wrote:
>
>> It sounds like a HINT in SQL.
>>
>> 2017-09-06 17:51 GMT+08:00 Yuxiang Mai :
>>
>>> Hi,all
>>>
>>> We have some questions about query across different model/cubes.  We
>>> know that kylin will evaluate the cost and select the best cube to
>>> query. In our usage, this worked very good. But if we add more models with
>>> filter, the problem comes.
>>>
>>> Here is our scenario, 2 model comes from the same fact table: table_A.
>>>
>>> Model test1 without any filter.
>>> cube1 with dimension A,B,C,D,E
>>>
>>> Model test1 with filter xxx = yyy.
>>> cube2 with dimension A
>>>
>>>
>>> If we query select count(1) from table_A; Kylin engine will route the
>>> query to cube2. But our target will be cube1.
>>>
>>> I wonder if we can specify the model when we make the query? Because if
>>> someone by mistake create a cube with overlaping
>>>  dimension in a new model with filter conditions. It will impact others.
>>> In our usage, we temporarily limit only 1 model for 1 project.
>>>
>>> So, im sum, my question is:
>>> I wonder if we can specify the model when we make the query?
>>>
>>>
>>> Thanks.
>>>
>>>
>>> --
>>> Yuxiang Mai
>>>
>>>
>>
>
>
> --
> Yuxiang Mai
> Sun Yat-Sen Unitversity
> State Key Lab of Optoelectronic
> Materials and Technologies
>


Rest API that returns job IDs

2017-09-04 Thread Li Yang
Hi,

I recall there are a few inquires on the Rest API to return job IDs.

Thanks to Guosheng, the work is done.
https://issues.apache.org/jira/browse/KYLIN-2795

The document link is here. Please wait one more day till the document is
re-generated.
http://kylin.apache.org/docs21/howto/howto_use_restapi.html#get-job-list

Cheers
Yang


Re: how to filter long tail data

2017-09-03 Thread Li Yang
Please ask Kylin related question here.

On Fri, Sep 1, 2017 at 2:47 PM, 杨浩  wrote:

> If a index is less than 2, we don't want to store it in hbase . How to
> filter the long tail data ?
>


Re: Document for Kylin with MicroStrategy

2017-09-03 Thread Li Yang
Cool~~~

On Fri, Sep 1, 2017 at 1:27 PM, ShaoFeng Shi  wrote:

> Hi,
>
> Joanna He (jingke...@kyligence.io) contributed a doc on how to integrate
> Kylin with MicroStrategy. The doc has been published on Kylin website[1]
> today. Thanks Joanna's contribution.
>
> https://kylin.apache.org/docs21/tutorial/microstrategy.html
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


Re: FileNotFoundException when building cube

2017-09-03 Thread Li Yang
If you can reproduce the problem in HDP sandbox, please share detailed logs
and people will be able to help.

On Thu, Aug 24, 2017 at 3:13 PM, RebieKong  wrote:

> Hey~
>
> System Environment
> Apache Hadoop 2.7.2
> Apache Hbase 1.2.4
> Apache Hive 2.1.1
> Apache Kylin 1.6.0
>
> In stage 3
> [Extract Fact Table Distinct Columns]
> I got an exception follow
> java.io.FileNotFoundException: File does not exist:
> hdfs://hdfs-namenode:8020/tmp/mapred/staging/
> hadoop2052916263/.staging/job_local2052916263_0001/libjars/
> hive-metastore-2.1.1.jar
>
> I had google it,some one said I should change my hadoop environment to HDP
> or CDH.
> I got some info from this
> https://codedump.io/share/WeMZr5agFXdw/1/getting-
> javaiofilenotfoundexception-file-does-not-exist-hive-exec-
> 210jar-error-while-trying-to-build-cubes-for-sample-data-in-apache-kylin
> but it doesn't work.


Re: error execute MapReduceExecutable{id=, name=Extract Fact Table Distinct Columns, state=RUNNING

2017-09-03 Thread Li Yang
This is the root cause. The metadata was inconsistent at that point.

> Failed to find 2a3e69cf-7ff1-4c7a-97b2-8909dac4fe51 in cube
CUBE[name=Remain_Cube3

On Wed, Aug 23, 2017 at 11:29 AM, yuyong.zhai  wrote:

> discard the cube and rebuild success
>
>
>
>  原始邮件
> *发件人:* yuyong.zhai
> *收件人:* user@kylin.apache.org
> *发送时间:* 2017年8月23日(周三) 11:26
> *主题:* error execute MapReduceExecutable{id=, name=Extract Fact Table
> Distinct Columns, state=RUNNING
>
> 2017-08-23 11:09:21,414 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_
> Remain_Cube3_Step
>
> 2017-08-23 11:09:21,414 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:163 : append job jar:
> xxx/kylin/lib/kylin-job-2.0.0.jar
>
> 2017-08-23 11:09:21,414 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:171 : append kylin.hbase.dependency:
> xxx/hbase/lib/hbase-common-0.98.12.1-hadoop2.jar to mapreduce.application.
> classpath
>
> 2017-08-23 11:09:21,414 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:176 : Didn't find mapreduce.application.classpath
> in job configuration, will run 'mapred classpath' to get the default value.
>
> 2017-08-23 11:09:21,414 INFO  [Job fdeea049-dab0-4f71-96b7-249e4ac03915-65]
> execution.AbstractExecutable:110 : Executing AbstractExecutable (Load
> HFile to HBase Table)
>
> 2017-08-23 11:09:21,415 DEBUG [Job fdeea049-dab0-4f71-96b7-249e4ac03915-65]
> dao.ExecutableDao:217 : updating job output, id: fdeea049-dab0-4f71-96b7-
> 249e4ac03915-17
>
> 2017-08-23 11:09:21,416 INFO  [Job fdeea049-dab0-4f71-96b7-249e4ac03915-65]
> execution.ExecutableManager:389 : job 
> id:fdeea049-dab0-4f71-96b7-249e4ac03915-17
> from READY to RUNNING
>
> 2017-08-23 11:09:21,432 INFO  [Job fdeea049-dab0-4f71-96b7-249e4ac03915-65]
> common.HadoopShellExecutable:58 : parameters of the HadoopShellExecutable:
>
> 2017-08-23 11:09:21,432 INFO  [Job fdeea049-dab0-4f71-96b7-249e4ac03915-65]
> common.HadoopShellExecutable:59 :  -input xx/kylin_metadata/kylin-
> fdeea049-dab0-4f71-96b7-249e4ac03915/Funnel_Result/hfile -htablename
> KYLIN_2P0N6JLP28 -cubename Funnel_Result
>
> 2017-08-23 11:09:21,623 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:202 : Hive Dependencies After
> Filtered:xxx/hive/lib/hive-exec-2.1.1.jar,xxx/hive/lib/
> hive-metastore-2.1.1.jar,xxx/hive/hcatalog/share/hcatalog/
> hive-hcatalog-core-2.1.1.jar
>
> 2017-08-23 11:09:21,623 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:230 : Kafka Dependencies:
>
> 2017-08-23 11:09:21,625 INFO  [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> common.AbstractHadoopJob:358 : Job 'tmpjars' updated -- file:
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:112 : Failed to find 
> 2a3e69cf-7ff1-4c7a-97b2-8909dac4fe51
> in cube CUBE[name=Remain_Cube3]
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017052300_2017060100 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017060100_2017062700 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017062700_2017062800 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017062700_2017062800 with
> status NEW
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017062800_2017062900 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017062900_2017063000 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017063000_2017070100 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017070100_2017070200 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017070200_2017070300 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017070300_2017070400 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> steps.FactDistinctColumnsJob:114 : 2017070400_2017070500 with
> status READY
>
> 2017-08-23 11:09:21,625 ERROR [Job ab916702-dfe1-4bfa-b976-b1c4c5e8286f-71]
> 

Re: error when load table from kafka

2017-08-19 Thread Li Yang
Please try describe how to reproduce the problem so others can help.

With so little information, my best guess is restart Kylin and see if the
same problem persists.

On Fri, Aug 11, 2017 at 11:48 PM, Billy Liu  wrote:

> Could you describe the Kylin version and reproduce steps, and kylin.log?
>
> 2017-08-09 20:42 GMT+08:00 崔苗 :
>
>> Hi,
>> there is an error when load table from kafka:
>> Overwriting conflict /table/DEFAULT.USERLOGIN.json, expect old TS 0, but
>> it is 1502279074583
>>
>> Thanks
>>
>
>


Re: Sequential execution of union all

2017-08-19 Thread Li Yang
The sequential execution behavior is implied by Calcite as a SQL execution
engine, not much of Kylin's work. The entry point is QueryService.execute()
where the control is passed to Calcite.

On Thu, Aug 10, 2017 at 3:14 PM, Alexander Sterligov <sterligo...@joom.it>
wrote:

> As I understand it is also single threader for joins (not with lookup
> tables, but with different OLAP), am I right?
>
> Could you please guide me where it happens in the code? I would like to
> contribute. Maybe there are already some tickets about it?
>
> On Sun, Aug 6, 2017 at 9:12 AM, Li Yang <liy...@apache.org> wrote:
>
>> That is right. Sub-queries are executed sequentially as of Kylin 2.0.
>>
>> On Fri, Jul 28, 2017 at 2:16 AM, Alexander Sterligov <sterligo...@joom.it
>> > wrote:
>>
>>> Hello!
>>>
>>> My fact table has 12 boolean fields and user id. I need to count
>>> distinct users with has certain combinations of these flags. So I do
>>> several sub-queries and union all of them.
>>> This query may take up to one minute and it doesn't depend on number of
>>> regionservers in hbase.
>>>
>>> It looks like sub-queries are executed sequentially. Or maybe segment
>>> pruning by dictionary is done sequentially.
>>>
>>> Am I right?
>>>
>>> Best regards,
>>> Alexander
>>>
>>
>>
>


Re: run job erros

2017-08-19 Thread Li Yang
Must be a temporary issue on the branch. Try the official 2.1 release pls.

On Wed, Aug 9, 2017 at 4:57 PM, liulang  wrote:

> hi,shaofeng:
>
> Error: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper not
> found at org.apache.hadoop.conf.Configuration.getClass(Configu
> ration.java:2195 ) at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(
> JobContextImpl.java:187 ) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747
> ) at org.apache.hadoop.mapred.MapTask.run(
> MapTask.java:341 ) at org.apache.hadoop.mapred.
> YarnChild$2.run(YarnChild.java:164 ) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422
> ) at org.apache.hadoop.security.
> UserGroupInformation.doAs(UserGroupInformation.java:1657
> ) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:158 ) Caused by:
> java.lang.ClassNotFoundException: Class 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper
> not found at org.apache.hadoop.conf.Configuration.getClassByName(C
> onfiguration.java:2101 ) at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193
> ) ... 8 more
>
>
> build 2.1 branch release
>
> Thanks。
>
> Longer
>
>
>


Re: error execute org.apache.kylin.storage. hbase.util.StorageCleanupJob

2017-08-06 Thread Li Yang
Note "org.apache.kylin.storage.hbase.util.StorageCleanupJob" is deprecated.

Please use "org.apache.kylin.tool.StorageCleanupJob" instead.

I'm updating related document.

Thanks for reporting!
Yang

On Mon, Jul 31, 2017 at 6:35 PM, liulang  wrote:

> Hi,重新打包了最新源码,还是这样;
> 并且我按步骤
> Step1: run command 'bin/kylin.sh org.apache.kylin.tool.AclTableMigrationCLI
> MIGRATE’, 这个已执行;
> Step2: drop hbase tables: kylin_metadata_acl and kylin_metadata_user
> 这个没敢执行,怕影响2.0的版本使用;
>
> 异常信息:
>
> 2017-07-31 18:26:20,159 INFO  [main StorageCleanupJob:283]: Checking table
> kylin_intermediate_cube_dsp_reportnbr_0e935361_9670_4d66_952a_a597abcc730f
> Exception in thread "main" java.lang.RuntimeException: error execute
> org.apache.kylin.storage.hbase.util.StorageCleanupJob
> at org.apache.kylin.common.util.AbstractApplication.execute(Ab
> stractApplication.java:42 )
> at org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(
> StorageCleanupJob.java:362 )
> Caused by: java.lang.NullPointerException
> at org.apache.kylin.common.util.ClassUtil.forRenamedClass(Clas
> sUtil.java:86 )
> at org.apache.kylin.common.util.ClassUtil.forName(ClassUtil.java:78
> )
> at org.apache.kylin.job.execution.ExecutableManager.parseTo(
> ExecutableManager.java:494 )
> at org.apache.kylin.job.execution.ExecutableManager.getJob(
> ExecutableManager.java:141 )
> at org.apache.kylin.storage.hbase.util.StorageCleanupJob.isTableInUse(
> StorageCleanupJob.java:347 )
> at org.apache.kylin.storage.hbase.util.StorageCleanupJob.
> cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:302
> )
> at org.apache.kylin.storage.hbase.util.StorageCleanupJob.execute(
> StorageCleanupJob.java:153 )
> at org.apache.kylin.common.util.AbstractApplication.execute(Ab
> stractApplication.java:37 )
> ... 1 more
> 2017-07-31 18:26:20,165 INFO  [Thread-1 ConnectionManager$
> HConnectionImplementation:2068]: Closing master protocol: MasterService
> 2017-07-31 18:26:20,173 INFO  [Thread-1 ConnectionManager$
> HConnectionImplementation:1676]: Closing zookeeper
> sessionid=0x25c4288a4a417e4
> 2017-07-31 18:26:20,175 INFO  [Thread-1 ZooKeeper:684]: Session:
> 0x25c4288a4a417e4 closed
> 2017-07-31 18:26:20,175 INFO  [main-EventThread ClientCnxn:512]:
> EventThread shut down
>
>
> 以上,请指导,谢谢。
> < Billy Liu > 在 2017-07-24 22:02:23 写道:
>
> I think the issue has been fixed in latest master code. Could you pull the
> code and have a try?
>
> 2017-07-24 18:24 GMT+08:00 liulang :
>
> hi,
> Run StorageCleanupJob error:
>
> 2017-07-24 16:04:28,645 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Class path contains multiple SLF4J bindings.
> 2017-07-24 16:04:28,645 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Found binding in [jar:file:/opt/apps/apache-hiv
> e-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/
> impl/StaticLoggerBinder.class]
> 2017-07-24 16:04:28,645 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Found binding in [jar:file:/opt/apps/apache-kyl
> in-2.0.0-bin/spark/lib/spark-assembly-1.6.3-hadoop2.6.0.
> jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 2017-07-24 16:04:28,645 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Found binding in [jar:file:/opt/apps/tez-0.8.4/
> lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 2017-07-24 16:04:28,645 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Found binding in [jar:file:/opt/apps/hadoop-2.7
> .2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf
> 4j/impl/StaticLoggerBinder.class]
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4
> jLoggerFactory]
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> Logging initialized using configuration in jar:file:/opt/apps/apache-hive
> -2.0.1-bin/lib/hive-common-2.0.1.jar!/hive-log4j2.properties
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> OK
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> Time taken: 0.924 seconds
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> OK
> 2017-07-24 16:04:28,646 INFO  [main StorageCleanupJob:283]: Checking table
> kylin_intermediate_cube_dsp_report_f92212f6_738d_4ef2_a727_1796e30c6cb0
> Exception in thread "main" 

Re: Sequential execution of union all

2017-08-06 Thread Li Yang
That is right. Sub-queries are executed sequentially as of Kylin 2.0.

On Fri, Jul 28, 2017 at 2:16 AM, Alexander Sterligov 
wrote:

> Hello!
>
> My fact table has 12 boolean fields and user id. I need to count distinct
> users with has certain combinations of these flags. So I do several
> sub-queries and union all of them.
> This query may take up to one minute and it doesn't depend on number of
> regionservers in hbase.
>
> It looks like sub-queries are executed sequentially. Or maybe segment
> pruning by dictionary is done sequentially.
>
> Am I right?
>
> Best regards,
> Alexander
>


Re: Kylin window function

2017-07-30 Thread Li Yang
Some test queries:
https://github.com/apache/kylin/blob/master/kylin-it/src/test/resources/query/sql_window/query01.sql
https://github.com/apache/kylin/blob/master/kylin-it/src/test/resources/query/sql_window/query02.sql
https://github.com/apache/kylin/blob/master/kylin-it/src/test/resources/query/sql_window/query03.sql


On Tue, Jul 25, 2017 at 11:31 AM, Billy Liu  wrote:

> + Yerui. Do you have any idea?
>
> 2017-07-25 11:25 GMT+08:00 Joanna He (Jingke He) :
>
>> The tech blog I referred.
>>
>> http://kylin.apache.org/blog/2016/11/16/window-function/
>>
>>
>>
>>
>>
>> 何京珂
>>
>> Joanna He
>>
>>
>>
>>
>>
>> *From: *"Joanna He (Jingke He)" 
>> *Reply-To: *"user@kylin.apache.org" 
>> *Date: *Tuesday, 25 July 2017 at 9:41 AM
>> *To: *"user@kylin.apache.org" 
>> *Subject: *Kylin window function
>>
>>
>>
>> Anyone familiar with kylin window function? I am trying to get running
>> total of past 12 months sales based on sql below and I am having
>> difficulties getting the correct syntax.
>>
>> The tech blog below mentioned kylin syntax follow calcite but on the
>> calcite function page, it does not specify the syntax within the window.
>>
>>
>>
>>
>>
>>
>>
>> select g.categ_lvl3_name
>>
>> ,c.month_id
>>
>> ,sum(price) as sales
>>
>> from KYLIN_Sales s
>>
>> join KYLIN_CAL_DT c
>>
>> on s.part_dt=c.cal_dt
>>
>> join KYLIN_CATEGORY_GROUPINGS g
>>
>> on s.leaf_categ_id=g.leaf_categ_id
>>
>> and s.LSTG_SITE_ID=g.SITE_ID
>>
>> group  by g.categ_lvl3_name, month_id
>>
>> order by g.categ_lvl3_name,month_id
>>
>>
>>
>>
>>
>> 何京珂
>>
>> Joanna He
>>
>>
>>
>
>


Re: New blog "Improving Spark Cubing in Kylin 2.0"

2017-07-25 Thread Li Yang
Bravo Kaisen!
赞 Kaisen!

On Sat, Jul 22, 2017 at 12:32 PM, ShaoFeng Shi 
wrote:

> Hello Kyliner,
>
> The blog "Improving Spark Cubing in Kylin 2.0" from committer Kangkai Sen
> is published in Kylin website today. For the users who are interested about
> Spark + Kylin, it can help:
>
> https://kylin.apache.org/blog/2017/07/21/Improving-Spark-Cubing/
>
> The original Chinese version is in Kaisen's personal blog:
> http://blog.bcmeng.com/
>
> Thanks Kaisen's sharing!
>
> If anyone want to contribute Kylin tech blog, please contact me.
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: How to make a Sizing Server for Kylin

2017-07-25 Thread Li Yang
Sharing will be appreciated.  :-)

On Sat, Jul 22, 2017 at 2:20 AM, Patricio Huichulef <phuichu...@gmail.com>
wrote:

> Therefore, we need to build a methodology to make an appropriate sizing
> for any solution. OK Let´s start to work on this.
>
> Thanks Li
>
> On Fri, Jul 21, 2017 at 3:36 AM, Li Yang <liy...@apache.org> wrote:
>
>> For me, build a small chunk of data and assess linearly for the full data
>> set is still the best approach.
>>
>> On Tue, Jul 18, 2017 at 12:43 AM, Patricio Eduardo Huichulef Carvajal <
>> phuichu...@gmail.com> wrote:
>>
>>> Hello Folks:
>>>
>>> I would like to know if somebody has a methodology to make a Sizing
>>> Server for Kylin OLAP Engine.
>>>
>>> Is there a process to size properly one or multiple server, to
>>> design a Kylin Architecture? For example, for cluster, what is the
>>> recommendation of Hardware Sizing for each node?, What are the parameters
>>> to make a properly size ? (Txs, Data Volume, Cube Size, amount of queries,
>>> etc.
>>>
>>> Thanks in advance
>>>
>>> PEHC
>>>
>>>
>>
>
>
> --
> Patricio Edo. Huichulef C.
> IT Trusted Advisor
>
> MIL Technology
> Ramon Sotomayor 2937
> Depto. 302, Providencia
> Santiago, 6650788 Chile
>
> Work: 562 4740959 <(562)%20474-0959>
> Mobile: 569 98858814
> Email: patricio.huichu...@vtr.net
> IM: phuichulef (AIM)
>  http://www.linkedin.com/in/patriciohuichulef
> My Blog: http://phuichulef.blogspot.com
>


Re: Define Model and Cube Programmatically

2017-07-25 Thread Li Yang
There is no public stable RESTful API to create cube programmatically.

However there is one API that Kylin GUI uses to create cube, and in the
debug console of modern browser, advanced user can observe how this API
works. But note, it is not a stable API and is subject to change from
version to version. User must not rely on it to build anything of
production quality.

On Thu, Jul 20, 2017 at 2:28 PM,  wrote:

> Hi All
>
> Is there a way of defining models and cubes programmatically?
>
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>


Re: How to make a Sizing Server for Kylin

2017-07-21 Thread Li Yang
For me, build a small chunk of data and assess linearly for the full data
set is still the best approach.

On Tue, Jul 18, 2017 at 12:43 AM, Patricio Eduardo Huichulef Carvajal <
phuichu...@gmail.com> wrote:

> Hello Folks:
>
> I would like to know if somebody has a methodology to make a Sizing
> Server for Kylin OLAP Engine.
>
> Is there a process to size properly one or multiple server, to design
> a Kylin Architecture? For example, for cluster, what is the recommendation
> of Hardware Sizing for each node?, What are the parameters to make a
> properly size ? (Txs, Data Volume, Cube Size, amount of queries, etc.
>
> Thanks in advance
>
> PEHC
>
>


Re: Failed in building kylin 2.0 demo cube at step 3.

2017-07-21 Thread Li Yang
By default, the user who runs Kylin process is also the user who submits MR
job. Unless you have other specific setting in Hadoop & MapReduce..

On Mon, Jul 17, 2017 at 8:35 PM, Norbert hu  wrote:

> *enviroment:*  CDH 5.9,  apache-kylin-2.0.0-bin-cdh57.tar.gz , run
> bin/kylin.sh start  as hdfs user.
>
>
> *error:*
> Caused by: org.apache.hadoop.security.AccessControlException: Permission
> denied: user=mapred, access=READ, inode="/user/history/done_inte
> rmediate/hdfs/job_1496716928803_0013-1500286115191-hdfs-
> Kylin_Fact_Distinct_Columns_kylin_sales_cube_Step-
> 1500286137251-0-0-FAILED-root.users.hdfs-1500286120268.
> jhist":hdfs:supergroup:-rwxrwx---
> at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationP
> rovider.checkFsPermission(DefaultAuthorizationProvider.java:281)
> at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationP
> rovider.check(DefaultAuthorizationProvider.java:262)
> at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationP
> rovider.checkPermission(DefaultAuthorizationProvider.java:175)cd
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
> heckPermission(FSPermissionChecker.java:152)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
> ission(FSDirectory.java:3560)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
> ission(FSDirectory.java:3543)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPath
> Access(FSDirectory.java:3514)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPat
> hAccess(FSNamesystem.java:6566)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
> LocationsInt(FSNamesystem.java:2009)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
> Locations(FSNamesystem.java:1977)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlock
> Locations(FSNamesystem.java:1890)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.get
> BlockLocations(NameNodeRpcServer.java:572)
> at org.apache.hadoop.hdfs.server.namenode.AuthorizationProvider
> ProxyClientProtocol.getBlockLocations(Authorizatio
> nProviderProxyClientProtocol.java:89)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
> erSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolS
> erverSideTranslatorPB.java:365)
> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
> enodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
> voker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> upInformation.java:1912)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
>
>
> *[hdfs@cloudera-3 bin]$  *hdfs dfs -ls  /user/history/done_intermedia
> te/hdfs/job_1496716928803_0013-1500286115191-hdfs-Kylin_
> Fact_Distinct_Columns_kylin_sales_cube_Step-1500286137251-
> 0-0-FAILED-root.users.hdfs-1500286120268.jhist
> -rwxrwx---   2 hdfs supergroup  21673 2017-07-17 18:08
> /user/history/done_intermediate/hdfs/job_1496716928803_0013-
> 1500286115191-hdfs-Kylin_Fact_Distinct_Columns_kylin_sales_c
> ube_Step-1500286137251-0-0-FAILED-root.users.hdfs-1500286120268.jhist
>
>
> my question is :  why runs as  mapred user ? I cannot find the config in
> kylin , hadoop,yarn.
>
> Thanks in advane.
>
> Norbert
>
>


Re: common.HadoopJobStatusChecker:58 : error check status

2017-07-19 Thread Li Yang
Given the exception actually happens in Hadoop code:
> java.lang.NullPointerException at
org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:380)

And you had cube built successfully before. You might want to check recent
changes to your Hadoop env. It seems broken somewhere.

On Fri, Jul 14, 2017 at 11:07 AM, crossme  wrote:

>
>
>
> >Perform third steps of logging information
>
>  Counters: 53
>  File System Counters
>   FILE: Number of bytes read=326082918
>   FILE: Number of bytes written=639475115
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=375767996
>   HDFS: Number of bytes written=154906
>   HDFS: Number of read operations=48
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=8
>  Job Counters
>   Failed reduce tasks=7
>   Killed reduce tasks=4
>   Launched map tasks=9
>   Launched reduce tasks=15
>   Data-local map tasks=7
>   Rack-local map tasks=2
>   Total time spent by all maps in occupied slots (ms)=554536
>   Total time spent by all reduces in occupied slots (ms)=1035019
>   Total time spent by all map tasks (ms)=554536
>   Total time spent by all reduce tasks (ms)=1035019
>   Total vcore-seconds taken by all map tasks=554536
>   Total vcore-seconds taken by all reduce tasks=1035019
>   Total megabyte-seconds taken by all map tasks=567844864
>   Total megabyte-seconds taken by all reduce tasks=1059859456
>  Map-Reduce Framework
>   Map input records=8758042
>   Map output records=70064417
>   Map output bytes=1547698142
>   Map output materialized bytes=310833429
>   Input split bytes=96597
>   Combine input records=70064417
>   Combine output records=42960200
>   Reduce input groups=289020
>   Reduce shuffle bytes=8450082
>   Reduce input records=1652047
>   Reduce output records=0
>   Spilled Records=87572447
>   Shuffled Maps =36
>   Failed Shuffles=0
>   Merged Map outputs=36
>   GC time elapsed (ms)=6372
>   CPU time spent (ms)=677610
>   Physical memory (bytes) snapshot=8539713536
>   Virtual memory (bytes) snapshot=36823269376
>   Total committed heap usage (bytes)=8535932928
>  Shuffle Errors
>   BAD_ID=0
>   CONNECTION=0
>   IO_ERROR=0
>   WRONG_LENGTH=0
>   WRONG_MAP=0
>   WRONG_REDUCE=0
>  File Input Format Counters
>   Bytes Read=0
>  File Output Format Counters
>   Bytes Written=0
>  org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
>   BYTES=1721507003
>
>
> >The following is an error in the kylin.log file.
>   There is no error in the Hadoop log file
>
> 2017-07-14 10:40:10,427 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 1 should running, 1
> actual running, 0 stopped, 0 ready, 10 already succeed, 8
> error, 6 discarded, 0 others
> 2017-07-14 10:40:12,442 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:22,450 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:32,457 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:42,467 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:52,478 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:02,493 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:10,430 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 1 should running, 1
> actual running, 0 stopped, 0 ready, 10 already succeed, 8
> error, 6 discarded, 0 others
> 2017-07-14 10:41:12,518 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:22,527 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:32,535 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:42,544 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:53,548 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] ipc.Client:867 : Retrying connect to server: dn1/
> 10.50.229.209:51098. Already tried 0 time(s); retry policy is
> 

Re: Query without HBase corprocessor

2017-07-17 Thread Li Yang
Not at the moment. HBase coprocessor is mandatory for Kylin. (Although I
knew some commercial version of Kylin makes HBase optional, that's a
different story.)

On Wed, Jul 12, 2017 at 2:11 PM,  wrote:

> Hi
>
> Is it possible to query without using HBase coprocessor?
>
> I’m running kylin on MapR, query does not work without HBase coprocessor
> at moment
>
>
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>


Re: Understanding aggregation group settings

2017-07-17 Thread Li Yang
I think the real question to answer is -- how would user query?

Once you can describe user's query pattern, it becomes smooth to discuss
how to optimize the cube using aggregation group, hierarchy, joint etc..

On Tue, Jul 11, 2017 at 10:58 PM, ShaoFeng Shi 
wrote:

> hi Stefan,
>
> You question is too long; today I can just answer your first question:
>
> Q:  "We have understood the concept behind derived dimensions and
> hierarchies, but we cannot figure out how to implement it with the given
> UI."
>
> Answer:
> 1) The "derived dimension" can only be from lookup tables, because kylin
> will take snapshot for lookup table, so with the snapshot in memory, kylin
> will be able to do a "deriving" operation from PK/FK to the other columns
> on lookup tables. Okay this is the background, now when you add a dimension
> from a lookup table, you will see there is an option letting you select
> "derived" or "normal". If you select "derived", it will put the FK column
> as Cube's physical dimension and remember using it to map this dimension at
> runtime. If you select "normal", then this column will be a physical
> dimension.
>
> 2) The hierarchy is a relationship among several dimensions. You need
> define the relationship in the "advanced settings" step; In a aggregation
> group, putting them into a "hierarchy" and in the sequence from parent to
> children. For example: country, city, street.
>
> You can check the sample cube that Kylin ships as an example.
>
>
> 2017-07-07 16:59 GMT+08:00 Hüls, Stefan :
>
>> Hi there,
>>
>>
>>
>> we are currently investigating Kylin to help us bring our OLAP
>> environment to hadoop.
>>
>>
>>
>> We have a very traditional datamart approach modelled in our DWH, but
>> struggle to understand how to model cubes in Kylin.
>>
>> We have understood the concept behind derived dimensions and hierarchies,
>> but we cannot figure out how to implement it with the given UI.
>>
>> I hope you can give us some hints or insights on how to do an optimal
>> cube build.
>>
>>
>>
>> We have one fact table (F) which has measures and one ID attribute per
>> dimension table. The IDs are the FK for the dimension tables.
>>
>> Like in the "Optimize Cube Design" article, this is a typical example:
>>
>>
>>
>> Fact   Dimensions
>>
>> measure1, measure2, ..., measureX, FK1, FK2, FK3 <-inner join-> PK1
>> H1  (D_TIME_DIMENSION)
>>
>>  <-inner join-> PK2 H1,
>> H2, H3, H4, H5, H6  (D_GEOGRAPHY)
>>
>>  <-inner join-> PK2 H1,
>> H2, H3, H4  (D_DATE)
>>
>> where PK has a 1 to 1 relationship to H1 in every dimension.
>>
>>
>>
>> 1.) How to model the dimension for D_TIME_DIMENSION. Is it PK as normal
>> and H1 as derived dimension, or do we ignore PK1 and just model H1 as
>> normal dimension?
>>
>> 2.) How to model the dimension for D_GEOGRAPHY. Are H1 to H6 normal
>> dimensions and should be modelled as a hierarchy later, or is any dimension
>> derived?
>>
>> 3.) The same question arises for D_DATE.
>>
>>
>>
>> The next questions arise regarding the aggregation groups. Since in your
>> description, every table attribute that is used in the cube model is called
>> "dimension", the
>>
>> question comes up which of these dimensions from the above example should
>> be included in aggregation groups and how.
>>
>>
>>
>> I would understand that if we do not define an aggregation group, we
>> would have 2^11 cuboids to be created (if every Hx dimension is "normal").
>>
>> Assume that for lookup table D_GEOGRAPHY, the H1 dimension has a high
>> cardinality and H2 to H6 are subgroups of H1 (aka. levels in a geo tree)
>>
>> H1 of D_TIME_DIMENSION is mandatory in all analyses.
>>
>> H1 of D_DATE is a classic date field, with H2 to H4 being WEEK, MONTH and
>> YEAR.
>>
>> How is the aggregation group UI meant to be used in this example? I
>> understand that aggregation groups are a white-list to reduce the number of
>> cuboids?
>>
>>
>>
>> In our current IBM Cognos Transformer Powercubes environment, we have
>> models with about 30 dimension tables and a total of about 90 levels. The
>> fact table has about 60 million rows.
>>
>> The final cube has a size of about 10 GB.
>>
>> This would result in about 2^30 to 2^90 cuboids in Kylin and is
>> impossible to produce. Do you know what kind of algorithm is used to
>> produce the IBM Powercubes?
>>
>> Access time is not very good on a lower aggregation layer, but the top
>> level aggregations are fast. The software is about 15 years old and his is
>> own issues, but
>>
>> building cubes with massive dimension numbers works fine as long as there
>> are not too many rows in the fact table and the filesize of a single file
>>
>> does not exceed 2GB at any time.
>>
>>
>>
>> We would like to somehow model these kind of aggregations, but we think
>> we would have to think 

Re: Restore lost data from hdfs

2017-07-14 Thread Li Yang
Nope. You could raise a JIRA and explain the requirement in more details.

On Thu, Jul 6, 2017 at 6:47 PM, Alexander Sterligov 
wrote:

> Hi,
>
> Are there any way to recalculate dictionaries and other resources which
> are stored at HDFS?
>
> Best regards,
> Alexander Sterligov
>


Re: consultation for kylin2.0 parameters

2017-07-12 Thread Li Yang
When turned on, kylin tries to grow an existing dictionary instead of
creating new dictionaries. However, as tested, this hurts performance in
some cases, and because memory is usually sufficient to load many
dictionaries, this flag is off by default.

Yang

On Wed, Jul 5, 2017 at 11:34 AM, 仇同心  wrote:

> Hi ,all:
>
>I have a question about “kylin.dictionary.growing-enabled”, Can you
> tell me  the function of  this parameter?
>
>
>
> Thanks!
>


Re: Re: Failed to find metadata store by url: kylin_metadata@hbase

2017-07-03 Thread Li Yang
> Caused by: java.lang.IllegalArgumentException: Failed to find metadata
store by url: kylin_metadata@hbase

There should another exception in log (typically before this line), telling
the root cause of why failed to connect HBase. Please look around.

Yang

On Thu, Jun 29, 2017 at 6:58 PM, java_prog...@aliyun.com <
java_prog...@aliyun.com> wrote:

> Hi,
> here is my configuration in kylin.properties
>
> kylin.env.hadoop-conf-dir=/usr/local/kylin/hadoop-conf
>
> [kylin@gateway conf]$ ll /usr/local/kylin/hadoop-conf
> 总用量 20
> lrwxrwxrwx 1 kylin kylin42 6月  28 13:31 core-site.xml
> -> /usr/local/hadoop/etc/hadoop/core-site.xml
> lrwxrwxrwx 1 kylin kylin36 6月  28 13:32 hbase-site.
> xml -> /usr/local/hbase/conf/hbase-site.xml
> lrwxrwxrwx 1 kylin kylin42 6月  28 13:31 hdfs-site.xml
> -> /usr/local/hadoop/etc/hadoop/hdfs-site.xml
> -rw-r--r-- 1 kylin kylin 17924 6月  28 13:33 hive-site.xml
> lrwxrwxrwx 1 kylin kylin42 6月  28 13:31 yarn-site.xml
> -> /usr/local/hadoop/etc/hadoop/yarn-site.xml
>
> and all log is below
>
> OS command error exit with 1 -- export 
> HADOOP_CONF_DIR=/usr/local/kylin/hadoop-conf && 
> /usr/local/spark/bin/spark-submit --class 
> org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  
> --conf 
> spark.yarn.jar=hdfs://ns1/kylin/spark/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar
>   --conf spark.yarn.queue=default  --conf 
> spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf 
> spark.master=yarn  --conf spark.executor.memory=4G  --conf 
> spark.eventLog.enabled=true  --conf 
> spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2 
>  --conf spark.submit.deployMode=client --files 
> /usr/local/hbase/conf/hbase-site.xml --jars 
> /usr/local/hbase/lib/htrace-core-3.2.0-incubating.jar,/usr/local/hbase/lib/hbase-client-1.2.0-cdh5.9.0.jar,/usr/local/hbase/lib/hbase-common-1.2.0-cdh5.9.0.jar,/usr/local/hbase/lib/hbase-protocol-1.2.0-cdh5.9.0.jar,/usr/local/hbase/lib/metrics-core-2.2.0.jar,/usr/local/hbase/lib/guava-12.0.1.jar,
>  /usr/local/kylin/lib/kylin-job-2.0.0.jar -className 
> org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable 
> default.kylin_intermediate_kylin_sales_cube_d7955f9a_d290_4479_866c_5745f7880b81
>  -output 
> hdfs:///kylin/kylin_metadata/kylin-09df3734-6dcf-41b2-99a1-02e2b4e16f9f/kylin_sales_cube/cuboid/
>  -segmentId d7955f9a-d290-4479-866c-5745f7880b81 -confPath 
> /usr/local/kylin/conf -cubename kylin_sales_cube
> SparkEntry args:-className org.apache.kylin.engine.spark.SparkCubingByLayer 
> -hiveTable 
> default.kylin_intermediate_kylin_sales_cube_d7955f9a_d290_4479_866c_5745f7880b81
>  -output 
> hdfs:///kylin/kylin_metadata/kylin-09df3734-6dcf-41b2-99a1-02e2b4e16f9f/kylin_sales_cube/cuboid/
>  -segmentId d7955f9a-d290-4479-866c-5745f7880b81 -confPath 
> /usr/local/kylin/conf -cubename kylin_sales_cube
> Abstract Application args:-hiveTable 
> default.kylin_intermediate_kylin_sales_cube_d7955f9a_d290_4479_866c_5745f7880b81
>  -output 
> hdfs:///kylin/kylin_metadata/kylin-09df3734-6dcf-41b2-99a1-02e2b4e16f9f/kylin_sales_cube/cuboid/
>  -segmentId d7955f9a-d290-4479-866c-5745f7880b81 -confPath 
> /usr/local/kylin/conf -cubename kylin_sales_cube
> spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
> spark.driver.cores is set but does not apply in client mode.
> Hive history 
> file=/usr/local/hive2/logs/kylin/hive_job_log_fe557857-7a8e-407f-aeab-94044b59931d_2113658910.txt
> Hive history 
> file=/usr/local/hive2/logs/kylin/hive_job_log_df0eaa5c-b047-43a3-bd07-0cc8c0ec148c_2093334660.txt
> Exception in thread "main" java.lang.RuntimeException: error execute 
> org.apache.kylin.engine.spark.SparkCubingByLayer
>   at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>   at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: Failed to find metadata store 
> by url: kylin_metadata@hbase
>   at 
> org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:99)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:110)
>   at 

Re: Multiple fact tables

2017-06-29 Thread Li Yang
The support of "multiple fact" is by allowing very big lookup table. Try
create a lookup table, with "skip snapshot" checked.

On Tue, Jun 27, 2017 at 3:11 AM, Nirav Patel  wrote:

> following thread that suggest that with kylin 2.x it supports multiple
> fact table in one cube. However, when I try to build Model via UI it only
> allows me to add one Fact table.
>
> http://apache-kylin.74782.x6.nabble.com/Re-Hive-table-
> design-multiple-fact-tables-or-rolled-up-td7396.html
>
> Am I missing something here?
>
> Or do I just create multiple Models and Cubes for fact tables and do Join
> at run time.
>
> Thanks
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: why all my cube status change to DESCBROKEN after reload metadata?

2017-06-29 Thread Li Yang
This is typically because Table / Cube inconsistency. E.g. a field
referenced in cube does not exist on table any more.

On Fri, Jun 23, 2017 at 11:13 AM, 赵天烁  wrote:

> RT,after I execute metadata reload through REST API,all my
> cube status change to DESCBROKEN as follow.
> all query send to that query server will get a 500 error.
> this problem has gone after I restart the query server.but I want to know
> why?how could this happen?
> --
>
> 赵天烁
>
> Kevin Zhao
>
> *zhaotians...@meizu.com *
>
>
>
> 珠海市魅族科技有限公司
>
> MEIZU Technology Co., Ltd.
>
> 广东省珠海市科技创新海岸魅族科技楼
>
> MEIZU Tech Bldg., Technology & Innovation Coast
>
> Zhuhai, 519085, Guangdong, China
>
> meizu.com
>


Re: Usage of aggregation groups

2017-06-29 Thread Li Yang
The approach sounds good to me and makes sense.

> The cube build time is taking forever.

Well, that depends more on your Hadoop env I guess. 6 dimensions are small
cubes indeed.

On Thu, Jun 22, 2017 at 10:49 PM, Sonny Heer  wrote:

> Hi users,
>
> I need some clarification on how to properly use aggregation groups.
>
> Assume I have report page 1 which has filters A, B, C, D.  When user is in
> page 2, these filters are passed along to (drilldown).  Page 2 has other
> filterable fields (1,2,3), but each is independently connected only to
> previous filtered options.  e.g.page2 fields won't need to be combined with
> another field in page 2.  ABCD with 1 but not ABCD 1 & 2.
>
> So what I did is created an aggregation group per field in page 2.  idea
> was so it wouldn't do a 2^n on ABCD123  but ABCD1, ABCD2, etc.  I'm not
> sure if this is correct way to handle.  The cube build time is taking
> forever.  Please advise...
>
>
>
>
> --
>
>
>


Re: kylin明细查询分页

2017-06-29 Thread Li Yang
what do you mean by 分页查询 (paging)?

2017-06-22 10:00 GMT+08:00 仇同心 :

> HI,all
>
>   在cube设计时,在度量项中,设置了10个字段为RAW,在cube构建完毕后,在查询页面怎么实现分页查询呢?
>
>
>
>
>
>
>
> 谢谢!
>


Re: 答复: 答复: 答复: table_snapshot file does not exist

2017-05-27 Thread Li Yang
Sounds good.  :-)

On Sat, May 27, 2017 at 3:03 PM, jianhui.yi <jianhui...@zhiyoubao.com>
wrote:

> Aha, Stupid way:
>
> 1. backup metadata
>
> 2. drop all cubes and models
>
> 3. unload that table
>
> 4. load that table
>
> 5. restore metadata.
>
>
>
> J
>
>
>
> *发件人:* Li Yang [mailto:liy...@apache.org]
> *发送时间:* 2017年5月27日 14:50
> *收件人:* user@kylin.apache.org
> *主题:* Re: 答复: 答复: table_snapshot file does not exist
>
>
>
> What has been done to fix this issue? Curious to know.
>
>
>
> On Sat, May 27, 2017 at 1:37 PM, jianhui.yi <jianhui...@zhiyoubao.com>
> wrote:
>
> Thanks,I fixed it.
>
>
>
> *发件人:* Li Yang [mailto:liy...@apache.org]
> *发送时间:* 2017年5月27日 10:29
> *收件人:* user@kylin.apache.org
> *主题:* Re: 答复: table_snapshot file does not exist
>
>
>
> It seems your Kylin metadata is somewhat corrupted. In the metadata there
> exists a snapshot of table PRODUCT_DIM, however its related physical file
> does not exist on HDFS.
>
> You can manually fix the metadata, or if data rebuild is easy, delete all
> metadata and start over again.
>
>
>
> On Fri, May 19, 2017 at 11:03 AM, jianhui.yi <jianhui...@zhiyoubao.com>
> wrote:
>
> Is it a build error
>
>
>
> *发件人:* Billy Liu [mailto:billy...@apache.org]
> *发送时间:* 2017年5月19日 11:00
> *收件人:* user <user@kylin.apache.org>
> *主题:* Re: table_snapshot file does not exist
>
>
>
> Is it a build error? or query error? You mentioned two scenarios, but one
> exception.
>
>
>
> 2017-05-18 14:25 GMT+08:00 jianhui.yi <jianhui...@zhiyoubao.com>:
>
> Hi all:
>
>When I build cube to run step 4: Build Dimension Dictionary, the
> following error occurred,how to solve it?
>
> When I use the dimensions of this table, this error will appear.
>
>
>
> java.io.FileNotFoundException: File does not exist: /kylin/kylin_metadata/
> resources/table_snapshot/DW.DIM_PRODUCT/1394db19-c200-
> 46f8-833c-d28878629246.snapshot
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:66)
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:56)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
>
> at org.apache.hadoop.hdfs.server.
> namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
>
> at org.apache.hadoop.hdfs.server.namenode.
> AuthorizationProviderProxyClientProtocol.getBlockLocations(
> AuthorizationProviderProxyClientProtocol.java:89)
>
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(
> ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
>
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>
> at org.apache.hadoop.ipc.RPC$
> Server.call(RPC.java:1073)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2141)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2137)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at org.apache.hadoop.security.
> UserGroupInformation.doAs(UserGroupInformation.java:1783)
>
> at org.apache.hadoop.ipc.Server$
> Handler.run(Server.java:2135)
>
>
>
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at sun.reflect.NativeConstructorAccessorImpl.
> newInstance(NativeConstructorAccessorImpl.java:57)
>
> at sun.reflect.DelegatingConstructorAccessorI
> mpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.
> newInstance(Constructor.java:526)
>
> at org.apache.hadoop.ipc.RemoteException.
> instantiateException(RemoteException

Re: 答复: Any query returns error 'null'

2017-05-27 Thread Li Yang
Meaning Kylin should be more robust on error cases. Let me see what can be
done...

On Sat, May 27, 2017 at 2:19 PM, jianhui.yi <jianhui...@zhiyoubao.com>
wrote:

> Kylin version is 2.0 on cdh5.7 ,no other stack trace.
>
> I used metastore reset to fix it,  then rebuild model and cube.
>
> May be caused by a previous metadata error.
>
> Thanks
>
>
>
> *发件人:* Li Yang [mailto:liy...@apache.org]
> *发送时间:* 2017年5月27日 14:09
> *收件人:* user@kylin.apache.org
> *主题:* Re: Any query returns error 'null'
>
>
>
> What's your Kylin version? Cannot analyze the stack trace without it.
>
>
>
> On Thu, May 25, 2017 at 11:04 AM, jianhui.yi <jianhui...@zhiyoubao.com>
> wrote:
>
> Hi all:
>
> My kylin cube is ready, But I run any queries that report the following
> errors,When I restart kylin service will be normal, after a period of
> time there will be such a mistake
>
>
>
> 2017-05-25 10:27:41,440 INFO  [Query 
> a00e9afa-b676-4932-a67b-61f9cb91543e-51505]
> service.QueryService:286 :
>
> ==[QUERY]===
>
> Query Id: a00e9afa-b676-4932-a67b-61f9cb91543e
>
> SQL: SELECT count(*) from FACT_ORDER_DETAIL d
>
> User: ADMIN
>
> Success: false
>
> Duration: 0.0
>
> Project: dw_fs
>
> Realization Names: []
>
> Cuboid Ids: []
>
> Total scan count: 0
>
> Total scan bytes: 0
>
> Result row count: 0
>
> Accept Partial: false
>
> Is Partial Result: false
>
> Hit Exception Cache: false
>
> Storage cache used: false
>
> Message: Error while executing SQL "SELECT count(*) from FACT_ORDER_DETAIL
> d": null
>
> ==[QUERY]===
>
>
>
> 2017-05-25 10:27:41,441 ERROR [http-bio-7070-exec-134]
> controller.BasicController:54 :
>
> org.apache.kylin.rest.exception.InternalErrorException: Error while
> executing SQL "SELECT count(*) from FACT_ORDER_DETAIL d": null
>
> at org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:400)
>
> at org.apache.kylin.rest.controller.QueryController.
> query(QueryController.java:69)
>
> at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> doInvoke(InvocableHandlerMethod.java:221)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:136)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> ServletInvocableHandlerMethod.invokeAndHandle(
> ServletInvocableHandlerMethod.java:104)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.invokeHandleMethod(
> RequestMappingHandlerAdapter.java:743)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.
> java:672)
>
> at org.springframework.web.servlet.mvc.method.
> AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doDispatch(DispatcherServlet.java:933)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doService(DispatcherServlet.java:867)
>
> at org.springframework.web.servlet.FrameworkServlet.
> processRequest(FrameworkServlet.java:951)
>
> at org.springframework.web.servlet.FrameworkServlet.
> doPost(FrameworkServlet.java:853)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
>
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:827)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:303)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(
> WsFilter.java:52)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:241)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>
&

Re: 答复: 答复: table_snapshot file does not exist

2017-05-27 Thread Li Yang
What has been done to fix this issue? Curious to know.

On Sat, May 27, 2017 at 1:37 PM, jianhui.yi <jianhui...@zhiyoubao.com>
wrote:

> Thanks,I fixed it.
>
>
>
> *发件人:* Li Yang [mailto:liy...@apache.org]
> *发送时间:* 2017年5月27日 10:29
> *收件人:* user@kylin.apache.org
> *主题:* Re: 答复: table_snapshot file does not exist
>
>
>
> It seems your Kylin metadata is somewhat corrupted. In the metadata there
> exists a snapshot of table PRODUCT_DIM, however its related physical file
> does not exist on HDFS.
>
> You can manually fix the metadata, or if data rebuild is easy, delete all
> metadata and start over again.
>
>
>
> On Fri, May 19, 2017 at 11:03 AM, jianhui.yi <jianhui...@zhiyoubao.com>
> wrote:
>
> Is it a build error
>
>
>
> *发件人:* Billy Liu [mailto:billy...@apache.org]
> *发送时间:* 2017年5月19日 11:00
> *收件人:* user <user@kylin.apache.org>
> *主题:* Re: table_snapshot file does not exist
>
>
>
> Is it a build error? or query error? You mentioned two scenarios, but one
> exception.
>
>
>
> 2017-05-18 14:25 GMT+08:00 jianhui.yi <jianhui...@zhiyoubao.com>:
>
> Hi all:
>
>When I build cube to run step 4: Build Dimension Dictionary, the
> following error occurred,how to solve it?
>
> When I use the dimensions of this table, this error will appear.
>
>
>
> java.io.FileNotFoundException: File does not exist: /kylin/kylin_metadata/
> resources/table_snapshot/DW.DIM_PRODUCT/1394db19-c200-
> 46f8-833c-d28878629246.snapshot
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:66)
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:56)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
>
> at org.apache.hadoop.hdfs.server.
> namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
>
> at org.apache.hadoop.hdfs.server.namenode.
> AuthorizationProviderProxyClientProtocol.getBlockLocations(
> AuthorizationProviderProxyClientProtocol.java:89)
>
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(
> ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
>
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>
> at org.apache.hadoop.ipc.RPC$
> Server.call(RPC.java:1073)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2141)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2137)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at org.apache.hadoop.security.
> UserGroupInformation.doAs(UserGroupInformation.java:1783)
>
> at org.apache.hadoop.ipc.Server$
> Handler.run(Server.java:2135)
>
>
>
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at sun.reflect.NativeConstructorAccessorImpl.
> newInstance(NativeConstructorAccessorImpl.java:57)
>
> at sun.reflect.DelegatingConstructorAccessorI
> mpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.
> newInstance(Constructor.java:526)
>
> at org.apache.hadoop.ipc.RemoteException.
> instantiateException(RemoteException.java:106)
>
> at org.apache.hadoop.ipc.RemoteException.
> unwrapRemoteException(RemoteException.java:73)
>
> at org.apache.hadoop.hdfs.DFSClient.
> callGetBlockLocations(DFSClient.java:1281)
>
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(
> DFSClient.java:1266)
>
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(
> DFSClient.java:1254)
>
> at org.apache.hadoop.hdfs.DFSInputStream.
> fetchLocatedBlocksAndGe

Re: Any query returns error 'null'

2017-05-27 Thread Li Yang
What's your Kylin version? Cannot analyze the stack trace without it.

On Thu, May 25, 2017 at 11:04 AM, jianhui.yi 
wrote:

> Hi all:
>
> My kylin cube is ready, But I run any queries that report the following
> errors,When I restart kylin service will be normal, after a period of
> time there will be such a mistake
>
>
>
> 2017-05-25 10:27:41,440 INFO  [Query 
> a00e9afa-b676-4932-a67b-61f9cb91543e-51505]
> service.QueryService:286 :
>
> ==[QUERY]===
>
> Query Id: a00e9afa-b676-4932-a67b-61f9cb91543e
>
> SQL: SELECT count(*) from FACT_ORDER_DETAIL d
>
> User: ADMIN
>
> Success: false
>
> Duration: 0.0
>
> Project: dw_fs
>
> Realization Names: []
>
> Cuboid Ids: []
>
> Total scan count: 0
>
> Total scan bytes: 0
>
> Result row count: 0
>
> Accept Partial: false
>
> Is Partial Result: false
>
> Hit Exception Cache: false
>
> Storage cache used: false
>
> Message: Error while executing SQL "SELECT count(*) from FACT_ORDER_DETAIL
> d": null
>
> ==[QUERY]===
>
>
>
> 2017-05-25 10:27:41,441 ERROR [http-bio-7070-exec-134]
> controller.BasicController:54 :
>
> org.apache.kylin.rest.exception.InternalErrorException: Error while
> executing SQL "SELECT count(*) from FACT_ORDER_DETAIL d": null
>
> at org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:400)
>
> at org.apache.kylin.rest.controller.QueryController.
> query(QueryController.java:69)
>
> at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> doInvoke(InvocableHandlerMethod.java:221)
>
> at org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:136)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> ServletInvocableHandlerMethod.invokeAndHandle(
> ServletInvocableHandlerMethod.java:104)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.invokeHandleMethod(
> RequestMappingHandlerAdapter.java:743)
>
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.
> java:672)
>
> at org.springframework.web.servlet.mvc.method.
> AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doDispatch(DispatcherServlet.java:933)
>
> at org.springframework.web.servlet.DispatcherServlet.
> doService(DispatcherServlet.java:867)
>
> at org.springframework.web.servlet.FrameworkServlet.
> processRequest(FrameworkServlet.java:951)
>
> at org.springframework.web.servlet.FrameworkServlet.
> doPost(FrameworkServlet.java:853)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
>
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:827)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:303)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(
> WsFilter.java:52)
>
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:241)
>
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
>
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>
> at org.springframework.security.web.access.
> ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>
> at org.springframework.security.web.session.
> SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
>
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>
> at org.springframework.security.web.authentication.
> 

Re: 答复: table_snapshot file does not exist

2017-05-26 Thread Li Yang
It seems your Kylin metadata is somewhat corrupted. In the metadata there
exists a snapshot of table PRODUCT_DIM, however its related physical file
does not exist on HDFS.

You can manually fix the metadata, or if data rebuild is easy, delete all
metadata and start over again.

On Fri, May 19, 2017 at 11:03 AM, jianhui.yi 
wrote:

> Is it a build error
>
>
>
> *发件人:* Billy Liu [mailto:billy...@apache.org]
> *发送时间:* 2017年5月19日 11:00
> *收件人:* user 
> *主题:* Re: table_snapshot file does not exist
>
>
>
> Is it a build error? or query error? You mentioned two scenarios, but one
> exception.
>
>
>
> 2017-05-18 14:25 GMT+08:00 jianhui.yi :
>
> Hi all:
>
>When I build cube to run step 4: Build Dimension Dictionary, the
> following error occurred,how to solve it?
>
> When I use the dimensions of this table, this error will appear.
>
>
>
> java.io.FileNotFoundException: File does not exist: /kylin/kylin_metadata/
> resources/table_snapshot/DW.DIM_PRODUCT/1394db19-c200-
> 46f8-833c-d28878629246.snapshot
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:66)
>
> at org.apache.hadoop.hdfs.server.
> namenode.INodeFile.valueOf(INodeFile.java:56)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
>
> at org.apache.hadoop.hdfs.server.
> namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
>
> at org.apache.hadoop.hdfs.server.
> namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
>
> at org.apache.hadoop.hdfs.server.namenode.
> AuthorizationProviderProxyClientProtocol.getBlockLocations(
> AuthorizationProviderProxyClientProtocol.java:89)
>
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(
> ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
>
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>
> at org.apache.hadoop.ipc.RPC$
> Server.call(RPC.java:1073)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2141)
>
> at org.apache.hadoop.ipc.Server$
> Handler$1.run(Server.java:2137)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at org.apache.hadoop.security.
> UserGroupInformation.doAs(UserGroupInformation.java:1783)
>
> at org.apache.hadoop.ipc.Server$
> Handler.run(Server.java:2135)
>
>
>
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at sun.reflect.NativeConstructorAccessorImpl.
> newInstance(NativeConstructorAccessorImpl.java:57)
>
> at sun.reflect.DelegatingConstructorAccessorI
> mpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.
> newInstance(Constructor.java:526)
>
> at org.apache.hadoop.ipc.RemoteException.
> instantiateException(RemoteException.java:106)
>
> at org.apache.hadoop.ipc.RemoteException.
> unwrapRemoteException(RemoteException.java:73)
>
> at org.apache.hadoop.hdfs.DFSClient.
> callGetBlockLocations(DFSClient.java:1281)
>
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(
> DFSClient.java:1266)
>
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(
> DFSClient.java:1254)
>
> at org.apache.hadoop.hdfs.DFSInputStream.
> fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:305)
>
> at org.apache.hadoop.hdfs.DFSInputStream.openInfo(
> DFSInputStream.java:271)
>
> at org.apache.hadoop.hdfs.DFSInputStream.(
> DFSInputStream.java:263)
>
> at org.apache.hadoop.hdfs.
> DFSClient.open(DFSClient.java:1585)
>
> at org.apache.hadoop.hdfs.DistributedFileSystem$3.
> doCall(DistributedFileSystem.java:309)
>
> at org.apache.hadoop.hdfs.DistributedFileSystem$3.
> doCall(DistributedFileSystem.java:305)
>
> at org.apache.hadoop.fs.FileSystemLinkResolver.
> resolve(FileSystemLinkResolver.java:81)
>
> at org.apache.hadoop.hdfs.DistributedFileSystem.open(
> DistributedFileSystem.java:305)
>
>

Re: How to clean up after Kylin 2.0

2017-05-26 Thread Li Yang
A more detailed list of leftover files under the job folder will help.

However IT IS NORMAL for the below folder to exist:
/kylin/kylin_metadata/JOB_ID/CUBE_NAME/cuboid

It holds a copy of cube data. It is needed if you later want to merge the
segments. And if you are sure the segments won't merge later, it is safe to
delete.

On Wed, May 17, 2017 at 7:48 PM, Itay Shwartz 
wrote:

> Thank you very much for your answer, Billy.
>
> We're currently experiencing it on any cube (Even when creating a new one)
> so I imagine this "buggy" state got created at some point in the last 9
> months since we started using Kylin. In order for it to be effective, what
> kind of data would you like me to provide to help you reproduce the issue
> on your end?
>
> Cheers,
> Itay
>
> -
> Itay Shwartz
>
> StructureIt
> 6th Floor
> Aldgate Tower
> 2 Leman Street
> London
> E1 8FA
>
> direct line: +44 (0)20 3286 9902
> mobile: +44 (0)74 1123 6614
> www.structureit.net
>
>
> On 17 May 2017 at 03:50, Billy Liu  wrote:
>
>> Thanks Itay for raising this question.
>>
>> When you rebuild the cube, the old segment will be invalid for query, but
>> available for cleanup. The StorageCleanupJob should clean those files, but
>> if not, that may be an issue. Could you log a JIRA for this issue and
>> describe how to reproduce it? That will help community to fix it a.s.a.p.
>>
>> Kylin will save duplicate cube data on both HDFS and HBase. The HBase one
>> is used for query, the HDFS one is used for later segment merge. If no
>> merged needed, it's safe to delete it manually.
>>
>> 2017-05-17 0:55 GMT+08:00 Itay Shwartz :
>>
>>> Hi,
>>>
>>> I work on a project where we build a cube multiple times a day using
>>> Kylin. We were using Kylin 1.6 and upgraded this week to Kylin 2.0.
>>>
>>> Since the upgrade I noticed that the HDFS usage had increased every time
>>> we rebuild the cube and the space is not cleared up. This is although we
>>> run both the StorageCleanupJob and metastore clean command as described
>>> here and here.
>>>
>>> When looking into HDFS to see where the increase is I see that the
>>> accumulated data is at: /kylin/kylin_metadata/
>>>
>>> It looks like every job is getting a new folder inside that folder and
>>> its size is at least the same as the size of the cube. Seems like some of
>>> these folders were not cleared even for very old jobs but since the upgrade
>>> to V2.0 all the folders for all jobs were not cleared. I deleted some of
>>> the older folders and it didn't affect the cube. I also created a test cube
>>> and then deleted the folder that was created for it and could still query
>>> the cube. Is it safe to delete these folders manually? Is it correct to
>>> assume that after the job is done all the data that needs to be maintained
>>> will be in HBase (Where I can find the cube and the metadata information)?
>>>
>>>
>>> Many thanks,
>>>
>>> Itay
>>>
>>> -
>>> Itay Shwartz
>>>
>>> StructureIt
>>> 6th Floor
>>> Aldgate Tower
>>> 2 Leman Street
>>> London
>>> E1 8FA
>>>
>>> direct line: +44 (0)20 3286 9902
>>> mobile: +44 (0)74 1123 6614
>>> www.structureit.net
>>>
>>>
>>
>


Re: Can i turn off kylin load base cuboid to Hbase ?

2017-05-26 Thread Li Yang
You can open a JIRA to request this feature. And it could be implemented if
requesters are many.  :-)

On Tue, May 16, 2017 at 5:07 PM, ShaoFeng Shi 
wrote:

> Hi bing,
>
> This JIRA may release your pain to some extend: https://issues.apache.
> org/jira/browse/KYLIN-2363
> But there is no plan to exclude base cuboid from the storage, as it is the
> base for other cuboids, and also for serving "select *" query and other
> no-matched queries.
>
> 2017-05-16 9:16 GMT+08:00 bing...@iflytek.com :
>
>> Hi,there
>> Can i turn off kylin load base cuboid to Hbase ? That will
>> saves a lot of storage resources.
>> In some special scenes,we just need to query appointed cuboid
>> (combinations),  don`t care about others.
>> Therefore, it`s not necessary to generate very large base cuboid.
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: 答复: TOPN value, which sum measure column group by dimension, error.

2017-05-26 Thread Li Yang
Mind to create a JIRA?

On Mon, May 15, 2017 at 3:02 PM, 苏 志锋  wrote:

> I change the measure column "V" data type from integer to double. resubmit
> the sqls.
> SQL:   SELECT LV1,SUM(V) V FROM TEST GROUP BY LV1
> Return: LV1   V
>  001  -0.994
>
> whithe sum(v) value should be -1.0, but kylin return -0.994.
> dd
> SQL:  SELECT LV1,SUM(V) V,COUNT(*) C FROM TEST GROUP BY LV1
> Return:  LV1   VC
>   001   -1.0 1
> the sum(v) is right.
> --
> *发件人:* 苏 志锋 
> *发送时间:* 2017年5月15日 12:06:41
> *收件人:* user@kylin.apache.org
> *主题:* TOPN value, which sum measure column group by dimension, error.
>
>
> TOPN value, which sum measure column group by dimension, error.
>
> *Environment*
>
>  Apache Kylin:1.6.0
>
> *STEPS*
>
>1. Kafka produce one message
>2. kylin consume the message
>3. kylin build cuboid
>4. execute sql
>
> *KafKa JSON data*
>
>  {“T”:”2017-05-15”,“LV1”:”001”,”V”:-1}
>
> *MODEL*
>
>  1、Dimensions : T, LV1
>
>  2、Measures : V
>
> *CUBE*
>
>1. Dimensions : T, LV1
>2. Measures: _COUNT_; SUM(V); TOPN SUM COLUMN V GROUP BY LV1;
>
> *SQL and result*
>
> 1、  SELECT LV1,SUM(V) V FROM TEST GROUP BY LV1
>
> LV1
>
> V
>
> 001
>
> 0
>
>
>
> 2、  SELECT LV1,SUM(V) V,COUNT(*) C FROM TEST GROUP BY LV1
>
> LV1
>
> V
>
> C
>
> 001
>
> -1
>
> 1
>
> *Problem*
>
> The sum(v) value is error in sql 1, and correct in sql 2.
>
> The probably reasons are:
>
> 1、org.apache.kylin.measure.topn. TopNCounterSerializer.deserialize
> function deserialize -1 to -0.99.
>
> 2、org.apache.kylin.metadata.tuple.setMeasureValue function convert
> double(-0.99) to long(0)
>
>


Re: Questions about SUM behavior when rewritten as TOPN

2017-05-14 Thread Li Yang
Em... this will be interesting to investigate. JIRA created.
https://issues.apache.org/jira/browse/KYLIN-2617

And sure, TOPN is approximate algorithm and it does not give precise
result. Nevertheless, cardinality 1 is very special case, I think even
approximate algorithm should give correct result in such case.



On Sun, May 14, 2017 at 8:21 AM, Billy Liu  wrote:

> Thanks Tingmao for the report.
>
> Could you show us the complete SQL? In your SQL, there is no order by
> statement. If no ORDER BY, the query should not be rewritten into TopN
> measure.
>
> 2017-05-12 23:52 GMT+08:00 Tingmao Lin :
>
>> Hi,
>>
>> We found that SUM() query on a cardinality 1 dimension is not accurate
>> (or "not correct") when automatically  rewritten as TOPN.
>> Is that the expected behavior of kylin or there are any other issue?
>>
>> We built a cube on a table ( measure1: bigint, dim1_id:varchar,
>> dim2_id:varchar, ... ) using kylin 1.6.0 (Kafka streaming source)
>>
>> The cube has two measures: SUM(measure1) and
>> TOPN(10,sum-orderby(measure1),group by dim2_id) . (other measures
>> omitted)
>> and two dimensions  dim1_id, dim2_id   (other dims omitted)
>>
>> About the source table data:
>> The cardinality of dim1_id  is 1 (same dim1_id for all rows in the
>> source table)
>> The cardinality of dim2_id  is 1 (same dim2_id for all rows in the source
>> table)
>> The possible value of measure1 is [1,0,-1]
>>
>> When we query
>> "select SUM(measure1) FROM table GROUP BY dim2_id"
>>  => the result has one row:"sum=7",
>>   from the kylin logs we found that the query has been automatically  
>> rewritten
>> as TOPN(measure1,sum-orderby(measure1),group by dim2_id)
>>
>> When we write another query to prevent TOPN rewrite, for example:
>>
>>"select SUM(measure1),count(*) FROM table GROUP BY dim2_id" =>   one
>> row -- "sum=-2,count=24576"
>>
>>"select SUM(measure1),count(*) FROM table"
>>  =>   one row -- "sum=-2,count=24576"
>>
>>
>> The result is different (7 and -2) when rewritting to TOPN or not.
>>
>>
>> My question is: are the following behavior "works as expected" ,or TOPN
>> algorithm does not support negative counter values very well , or any issue
>> there?
>>
>>
>> 1. SUM() query  automatically rewritten as TOPN and gives approximated
>> result when no TOPN present in the query.
>>
>> 2. When cardinality is 1, TOPN does not give accurate result.
>>
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>


Re: How to apply historical Updates to existing cube data

2017-05-14 Thread Li Yang
Refreshing a segment of time range in the past is the way to pick up
historic data changes. We don't see this as a common use case though.
History data should not change in most cases.

A new HTable is created to hold the new segment, and the old segment and
its HTable become garbage to be collected.
http://kylin.apache.org/docs20/howto/howto_cleanup_storage.html

Cheers
Yang

On Fri, May 12, 2017 at 6:31 AM, Nirav Patel  wrote:

> First link says you can do incremental based on "range of segments". Is it
> a timestamp/date range that we define during cube creation?
> ANd since data is stored in hbase will kylin just overwrite new data with
> new one with same rowkeys?
>
> Thanks
>
> On Thu, May 11, 2017 at 1:26 PM, Alberto Ramón 
> wrote:
>
>> Q1- Check this previous mailList about late data:
>> http://apache-kylin.74782.x6.nabble.com/Reloading-data-td5669.html
>>
>> You only will need recalculate segments involved
>>
>> Q2- Check Shardin (https://issues.apache.org/jira/browse/KYLIN-1453)
>>   Partition by time column is not reoomended (It Will create hotspot in
>> HBase)
>>
>>
>>
>> On 11 May 2017 at 19:43, Nirav Patel  wrote:
>>
>>> Hi,
>>>
>>> Correct me if I am wrong but currently you can not update existing kylin
>>> cube without refreshing entire cube. Does it mean if I am pulling new data
>>> from hive based on lets say customerId, Timestamp for which I already built
>>> cube before I have to rebuild entire cube from scratch? Or can I say
>>> refresh between startTime and endTime which will update cube data for that
>>> timeframe only.
>>>
>>> Also Hive data can be partitioned by any keys(columns) not just
>>> timestamp. so why not allow kylin cube updates based on any arbitrary
>>> partition strategy that user have defined on their hive table?
>>> e.g. update part of the cube based on timestamp, customerid, batchid etc.
>>>
>>> Thanks,
>>> Nirav
>>>
>>>
>>>
>>> [image: What's New with Xactly] 
>>>
>>>   [image: LinkedIn]
>>>   [image: Twitter]
>>>   [image: Facebook]
>>>   [image: YouTube]
>>> 
>>
>>
>>
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 
>


Re: Document that explain how Kylin query, indexing, metadata engine works

2017-05-14 Thread Li Yang
No for now I believe. You can help by creating JIRAs to mark these tasks,
better in small topics, and community will work them out.

On Fri, May 12, 2017 at 6:09 AM, Nirav Patel  wrote:

> I see there are documents that explains how kylin ingest hive tables and
> build cubes/cuboids using MR and/or spark but there is nothing about how it
> generates metadata about cubes/segments, how it builds indexes
> (dictionaries, inverted index if any) and how that helps querying against
> cubes stored on hbase. There are some slides/papers but they all date back
> to 2014/15 which is quite old.
>
> Are there any recent documents/articles on these ?
>
> Thanks
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: 答复: kylin nonsupport Multi-value dimensions?

2017-05-13 Thread Li Yang
> java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]

This error is about dup key in a dimension table. The primary key of
dimension table must be unique on all rows. And in this case, the key
"1446" appears twice.

On Wed, May 10, 2017 at 6:59 PM, Alberto Ramón 
wrote:

> You can convert this dim to string and check performance using like filters
>
> With hive duplicate values in fact table.  One for each dim value
>
> Other complex solution can be extended dictionary encode dimension to
> understand multivalues
>
> No more ideas :)
>
>
> On 10 May 2017 8:51 a.m., "jianhui.yi"  wrote:
>
> Sorry, I write it wrongly,this problem is multi-value dimension,
>
> Example: I have a fact table named fact_order,a dimension table named
> dim_sales
>
> In the fact_order table ,An order data contains multiple salespeople.
>
> When I use fact_order join dim_sales it report that error: Dup key found.
>
> How can I solve it ?
>
>
>
> *发件人:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *发送时间:* 2017年5月10日 15:29
> *收件人:* user 
> *主题:* Re: kylin nonsupport Multi-value dimensions?
>
>
>
> Hi,
>
> Not all hive types are supported
>
> Check this lines:
> https://github.com/apache/kylin/blob/5d4982e247a2172d97d44c8
> 5309cef4b3dbfce09/core-metadata/src/main/java/org/
> apache/kylin/dimension/DimensionEncodingFactory.java#L76
>
>
>
> On 10 May 2017 at 08:10, jianhui.yi  wrote:
>
> I encountered a multi-dimensional dimension of the problem, and I used
> bridge table to try to solve it, but when building a cube,it report an error
>
> java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
> key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]
>
>  at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable
> .java:86)
>
>  at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.ja
> va:69)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupSt
> ringTable.java:79)
>
>  at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.
> java:57)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.(Lookup
> StringTable.java:65)
>
>  at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager
> .java:644)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegm
> ent(DictionaryGeneratorCLI.java:98)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegm
> ent(DictionaryGeneratorCLI.java:54)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(Cre
> ateDictionaryJob.java:66)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWo
> rk(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(
> AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWo
> rk(DefaultChainedExecutable.java:64)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(
> AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRun
> ner.run(DefaultScheduler.java:142)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1145)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
>
>
>
>
>
>
>
>
>


Re: multiple column distinct count

2017-05-13 Thread Li Yang
You are right. The GUI still cannot input multiple columns for the
count-distinct measure. A JIRA is created.
https://issues.apache.org/jira/browse/KYLIN-2616



On Thu, May 4, 2017 at 6:04 PM, 市场中心-ZHANGDA32698  wrote:

> Hi,
>
>
>
> As stated in the release note and jira https://issues.apache.org/
> jira/browse/KYLIN-490 , multiple column distinct count is supported in
> v2.0. So in order to do ‘select count distinct (A,B) from table ’, I assume
> I need to specify a count-distinct measure that includes both A and B in
> the cube design stage, but I notice that in the ‘edit measure’ UI, the
> ‘Param Value’ is a drop-down list where I can’t input more than 1 column.
> I’m curious how does kylin do multi column count distinct query without
> defining a multi-column count-distinct measure?
>


Re: Fail to overwrite mapred conf at cube level

2017-05-04 Thread Li Yang
As verified the thing worked in my sandbox.

I'm running a 2.0 binary on a 1.6 metadata. The cube has override
properties:

  "override_kylin_properties" : {
"kylin.job.mr.config.override.test.foo" : "barbar"
  }

And the setting effectively reflects in MR job configuration in YARN web
GUI.

Still don't know why it didn't work in BioLearning's case. Suggest check
YARN GUI to verify the MR job configuration. Also this setting does not
affect Hive which is as designed.

Cheers
Yang

On Thu, May 4, 2017 at 1:53 PM, Li Yang <liy...@apache.org> wrote:

> From code, it looks like the backward compatibility is already
> implemented. I'm doing a real test now.
>
> On Thu, May 4, 2017 at 1:24 PM, ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
>> Which version are you running? I know v1.6 works okay, but when dev v2.0,
>> there is a rename for these property names, the prefix has been changed to "
>> kylin.engine.mr.config-override.". Please take a try, and we will update
>> the doc soon.
>>
>> 2017-05-03 23:04 GMT+08:00 ? ? <biolearn...@hotmail.com>:
>>
>>> Hey,
>>>
>>>
>>> I tried to overwrite some mapred conf at cube level - Advanced Setting,
>>> but per configuration of mapred job in Job UI, they did not take effect.
>>> what is the wrong, or Kylin does not support to overwrite the conf items?
>>>
>>>
>>>"kylin.job.mr.config.override.mapreduce.reduce.cpu.vcores": "8",
>>> "kylin.job.mr.config.override.mapreduce.map.cpu.vcores": "8",
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


[ANNOUNCE] Apache Kylin 2.0.0 released

2017-05-02 Thread Li Yang
The Apache Kylin team is pleased to announce the immediate availability of
the 2.0.0 release. The release note is here [1]. The source code and binary
package can be downloaded from Kylin's download page [2].


This is the most powerful build ever that features Spark Cubing, Snowflake
Data Model and runs TPC-H Benchmark.


The Apache Kylin Team would like to hear from you and welcomes your
comments and contributions.



Thanks,

The Apache Kylin Team



[1] https://kylin.apache.org/docs20/release_notes.html

[2] https://kylin.apache.org/download/


Re: [Announce] New Apache Kylin committer Zhixiong Chen

2017-04-29 Thread Li Yang
Welcome Zhixiong!

Yang

On Sat, Apr 29, 2017 at 6:07 PM, Luke Han  wrote:

> On behalf of the Apache Kylin PMC, I am very pleased to announce
> that Zhixiong Chen has accepted the PMC's invitation to become a
> committer on the project.
>
> We appreciate all of Zhixiong's generous contributions about many bug
> fixes, patches, helped many users. We are so glad to have him to be
> our new committer and looking forward to his continued involvement.
>
> Congratulations and Welcome, Zhixiong!
>


Re: [Announce] New Apache Kylin committer Roger Shi

2017-04-29 Thread Li Yang
Welcome Roger!

Yang

On Sat, Apr 29, 2017 at 6:07 PM, Luke Han  wrote:

> On behalf of the Apache Kylin PMC, I am very pleased to announce
> that Roger Shi has accepted the PMC's invitation to become a
> committer on the project.
>
> We appreciate all of Roger's generous contributions about many bug
> fixes, patches, helped many users. We are so glad to have him to be
> our new committer and looking forward to his continued involvement.
>
> Congratulations and Welcome, Roger!
>


Re: Difference between using ES and Kylin as real-time OLAP engine?

2017-04-07 Thread Li Yang
My personal opinion.

Choosing the right OLAP solution is not easy. There are different options
for different scales.

If it is millions of rows, then RDBMS like MySql / PostgreSQL shall fly.

If it is billions of rows, then Druid / ES / Kylin will all work given you
get the right hardware and software configuration.

If it is trillions of rows (or more), then Kylin has a big advantage thanks
to its precalculation system. Read more about precalculation here

.


Cheers
Yang

On Fri, Apr 7, 2017 at 10:39 AM, lufeng  wrote:

> Hi Luke
>
> I find a comparison between Druid  that can provide sub-second OLAP
> queries an and ES[1]. Maybe there are some identity of views between Kylin
> and ES.
>
> Yes, compare with HBase that ES has no standout features to invest to this
> path and Kylin do lots of internal optimization on HBase storage.
>
> I will glad to share the comparison to community when got some progress.
>
> Thanks for your reply.
>
>
> [1] http://druid.io/docs/latest/comparisons/druid-vs-elasticsearch.html
>
>
>
> 在 2017年4月6日,下午5:19,Luke Han  写道:
>
> ES is not a target storage for Kylin so far, at least not on coming
> release plan.
>
> There are already many storage options in Hadoop Ecosystem, I don't think
> there's strong reason to invest on this path.
>
> And I don't remember there's any benchmark or comparison available today
> for your purpose.
> Please share with us if you have chance to do it:)
>
> Thanks.
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Wed, Apr 5, 2017 at 6:17 PM, lufeng  wrote:
>
>> Hi All
>>
>> Now if I want to build a OLAP platform to analyze our huge data, the
>> first impression is using Hive,but it can not match our real-time query
>> needs. Then I found lots of companies used ES to build their OLAP engine
>> before like tencent’s Hermes. So I want to know that  what’s the
>> difference between ES and Apache Kylin from different dimensions like
>> filter, aggregation queries or etc.
>>
>> * Performance
>> * Flexible queries
>> * Data Model
>> * BI integration
>> * …
>>
>> After Kylin release latest architecture that can support different
>> storage engine, I found that Kylin will support ES[1] as it’s backend
>> storage engine. So I think ES is ROLAP engine that lots of people use ES as
>> NoSQL DB and Kylin is a MOLAP engine.
>>
>> I would love to hear some feedback for my confusions.
>>
>> Thanks.
>>
>> [1] https://github.com/apache/kylin/pull/23
>>
>
>
>


Re: Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?

2017-04-06 Thread Li Yang
Shaofeng is right. Check out details --
https://issues.apache.org/jira/browse/KYLIN-1749

On Wed, Apr 5, 2017 at 9:45 AM, ShaoFeng Shi <shaofeng...@apache.org> wrote:

> Hi bingli,
>
> I didn't try a agg group with only 1 dimension; please check whether
> removing the three single dim group to see whether it can work. Anyway,
> this is a bug I think.
>
> Regarding "precicely define combination with agg group", yes it is doable
> with agg group; say if you only want to use the combination ABCD, you can
> make them into a group, and then mark all these 4 as "mandatory", then for
> this group, only 1 cuboid will be calculated (otherwise will be 16). While,
> in older Kylin versions, this isn't allowed, so you need configure "
> kylin.cube.aggrgroup.isMandatoryOnlyValid=true" in kylin.properties.
>
> 2017-04-05 9:24 GMT+08:00 bing...@iflytek.com <bing...@iflytek.com>:
>
>>  你好,李杨:
>> 为什么kylin 最终解析的cuboids 与 我通过页面设计的不一致。这是不是 aggregation groups的一个BUG?
>>
>> 不一致,侧面验证是执行如下查询语句报错误,错误详见附件:
>> select ts_hour, sum(request)
>> from view_flow_insight
>> group by ts_hour
>>
>>
>> 你给的文章我早先也拜读过,另外《Apache Kylin 权威指南》一书中指出:“聚合组的设计非常灵活,甚至可以用来描
>> 述一些极端的设计。假设我们的业务需求非常单一,只需要
>> 某些特定的Cuboid,那么可以创建多个聚合组,每个聚合组代表一个Cuboid,
>> ..”。根据以上资料,我设计了符合我业务需求的 Cube(由于展示层使用superset,无法
>> 使用多表,所以只能使用视图转成一张表),最终存在一些 cuboid无法查询。
>>
>> --
>> bing...@iflytek.com
>>
>>
>> *From:* Li Yang <liy...@apache.org>
>> *Date:* 2017-04-04 17:26
>> *To:* user <user@kylin.apache.org>
>> *CC:* ShaoFeng?Shi <shaofeng...@apache.org>
>> *Subject:* Re: Re: How Kylin Cuboid Scheduler Work With Aggregation
>> Groups ?
>> Google "Kylin aggregation group" and the first result is:
>> http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/
>>
>> On Mon, Apr 3, 2017 at 12:03 PM, bing...@iflytek.com <bing...@iflytek.com
>> > wrote:
>>
>>> 你好,少峰:
>>> kylin cube中,无论是使用 aggregation 
>>> group还是其他cube优化策略,最终得到的都是一系列组合(如:<day_time,
>>> gender>),而这些组合实际上是与 cuboid 唯一对应的。
>>> 在使用sql查询的时候,如果没有对应的 cuobid,那么查询是失败的(排除 extend、derived的维度组合)。
>>>
>>> 下图是,apache kylin官网对 aggregation group的解析。按同样的规则,在上封邮件中定义的Cube,
>>> 应该只会产生10种维度组合,即
>>><day_time, gender> 576
>>><day_time, age> 544
>>><day_time, brand> 528
>>><day_time, model> 520
>>><day_time, resolution> 516
>>><day_time, os_version> 514
>>><day_time, ntt>513
>>> 256
>>> 128
>>>   512
>>>  对应的 cuboid 为后面的数字。从cube_statistics任务中看,最后只有 1023,516,576,513,528,
>>> 514,544,520 这些组合(查询Hbase Meta表也是这种情况)。
>>>
>>>  在 《Apache Kylin 权威指南》一书中,有介绍在一些极端情况下(如:precisely define the
>>> cuboids/combinations) aggregation group 的使用方法。
>>>  所以,我以为目前 kylin 是支持这种定义方法的。
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> bing...@iflyek.com
>>>
>>>
>>> *From:* ShaoFeng Shi <shaofeng...@apache.org>
>>> *Date:* 2017-04-02 22:17
>>> *To:* user <user@kylin.apache.org>
>>> *Subject:* Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?
>>> Hi Bing,
>>>
>>> An aggregation group is a dimension group, or say a sub-cube; it is NOT
>>> a cuboid.
>>>
>>> I guess you want to precisely define the cuboids/combinations, that
>>> isn't supported as in many cases user couldn't list all the combinations
>>> they use. But you can describe them with the agg group / mandatory / joint
>>> as close as possible.
>>>
>>> 2017-03-31 15:49 GMT+08:00 bing...@iflytek.com <bing...@iflytek.com>:
>>>
>>>>   Hi,all
>>>>   I have a Cube, the desc is :
>>>>
>>>> {
>>>>   "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446",
>>>>   "last_modified": 1490860973892,
>>>>   "version": "1.6.0",
>>>>   "name": "adx_flow_insight",
>>>>   "model_name": "adx_operator",
>>>>   "description": "",
>>>>   "null_string": null,
>>>>   "dimensions": [
&g

Re: Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?

2017-04-04 Thread Li Yang
Google "Kylin aggregation group" and the first result is:
http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

On Mon, Apr 3, 2017 at 12:03 PM, bing...@iflytek.com 
wrote:

> 你好,少峰:
> kylin cube中,无论是使用 aggregation group还是其他cube优化策略,最终得到的都是一系列组合(如: gender>),而这些组合实际上是与 cuboid 唯一对应的。
> 在使用sql查询的时候,如果没有对应的 cuobid,那么查询是失败的(排除 extend、derived的维度组合)。
>
> 下图是,apache kylin官网对 aggregation group的解析。按同样的规则,在上封邮件中定义的Cube,
> 应该只会产生10种维度组合,即
> 576
> 544
> 528
> 520
> 516
> 514
>513
> 256
> 128
>   512
>  对应的 cuboid 为后面的数字。从cube_statistics任务中看,最后只有 1023,516,576,513,528,514,
> 544,520 这些组合(查询Hbase Meta表也是这种情况)。
>
>  在 《Apache Kylin 权威指南》一书中,有介绍在一些极端情况下(如:precisely define the
> cuboids/combinations) aggregation group 的使用方法。
>  所以,我以为目前 kylin 是支持这种定义方法的。
>
>
>
>
>
>
>
>
>
>
> --
> bing...@iflyek.com
>
>
> *From:* ShaoFeng Shi 
> *Date:* 2017-04-02 22:17
> *To:* user 
> *Subject:* Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?
> Hi Bing,
>
> An aggregation group is a dimension group, or say a sub-cube; it is NOT a
> cuboid.
>
> I guess you want to precisely define the cuboids/combinations, that isn't
> supported as in many cases user couldn't list all the combinations they
> use. But you can describe them with the agg group / mandatory / joint as
> close as possible.
>
> 2017-03-31 15:49 GMT+08:00 bing...@iflytek.com :
>
>>   Hi,all
>>   I have a Cube, the desc is :
>>
>> {
>>   "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446",
>>   "last_modified": 1490860973892,
>>   "version": "1.6.0",
>>   "name": "adx_flow_insight",
>>   "model_name": "adx_operator",
>>   "description": "",
>>   "null_string": null,
>>   "dimensions": [
>> {
>>   "name": "GENDER",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "GENDER",
>>   "derived": null
>> },
>> {
>>   "name": "AGE",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "AGE",
>>   "derived": null
>> },
>> {
>>   "name": "BRAND",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "BRAND",
>>   "derived": null
>> },
>> {
>>   "name": "MODEL",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "MODEL",
>>   "derived": null
>> },
>> {
>>   "name": "RESOLUTION",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "RESOLUTION",
>>   "derived": null
>> },
>> {
>>   "name": "OS_VERSION",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "OS_VERSION",
>>   "derived": null
>> },
>> {
>>   "name": "NTT",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "NTT",
>>   "derived": null
>> },
>> {
>>   "name": "TS_MINUTE",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "TS_MINUTE",
>>   "derived": null
>> },
>> {
>>   "name": "TS_HOUR",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "TS_HOUR",
>>   "derived": null
>> },
>> {
>>   "name": "DAY_TIME",
>>   "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>   "column": "DAY_TIME",
>>   "derived": null
>> }
>>   ],
>>   "measures": [
>> {
>>   "name": "_COUNT_",
>>   "function": {
>> "expression": "COUNT",
>> "parameter": {
>>   "type": "constant",
>>   "value": "1",
>>   "next_parameter": null
>> },
>> "returntype": "bigint"
>>   },
>>   "dependent_measure_ref": null
>> },
>> {
>>   "name": "REQUEST_PV",
>>   "function": {
>> "expression": "SUM",
>> "parameter": {
>>   "type": "column",
>>   "value": "REQUEST",
>>   "next_parameter": null
>> },
>> "returntype": "bigint"
>>   },
>>   "dependent_measure_ref": null
>> },
>> {
>>   "name": "IMPRESS_PV",
>>   "function": {
>> "expression": "SUM",
>> "parameter": {
>>   "type": "column",
>>   "value": "IMPRESS",
>>   "next_parameter": null
>> },
>> "returntype": "bigint"
>>   },
>>   "dependent_measure_ref": null
>> },
>> {
>>   "name": "CLICK_PV",
>>   "function": {
>> "expression": "SUM",
>> "parameter": {
>>   "type": "column",
>>   "value": "CLICK",
>>   "next_parameter": null
>> },
>> "returntype": "bigint"
>>   },
>>   "dependent_measure_ref": null
>> },
>> {
>>   "name": "FILL_PV",
>>   "function": {
>> "expression": "SUM",
>> 

Re: Kylin + SparkSQL integration

2017-03-30 Thread Li Yang
That's right. The big picture is very true.

On Thu, Mar 30, 2017 at 3:24 AM, Nirav Patel <npa...@xactlycorp.com> wrote:

>
> Correct me if I am wrong but coprocessor for predicate pushdown is only
> necessary for custom filters and custom computations, right? Even without
> co-processor queries can be converted to standard Hbase filters and for any
> computation spark-hbase connector (e..g phoenix spark plugin)  can be
> leveraged. This connector will basically do:
> 1. convert sparksql into hbase filters for pushdown
> 2. apply any additional filters that can not be pushdown due to lack of
> support from Hbase Filters
> 3. use spark dataframe ability to do joins, group by, all kinds of
> standard and custom aggregations.
>
> I think overall sparksql approach can be more scalable then coprocessor.
> That way you can replace hbase with other database as long as there is a
> spark connector for it.
>
> Thanks,
> Nirav
>
> On Fri, Mar 24, 2017 at 4:31 PM, Li Yang <liy...@apache.org> wrote:
>
>> > taking advantage of underlaying datasource capabilities (predicate
>> pushdown, projection etc) is important to improve query performance.
>>
>> That is very true. There was discussion about replacing HBase with
>> Cassandra
>> <http://apache-kylin.74782.x6.nabble.com/Cassandra-instead-of-HBase-in-Kylin-td2688.html>
>> previously. And the worry is lack of coprocessor will prevent predicate &
>> aggregation pushdown. Similar concern exists for Kudu.
>>
>> Cheers
>> Yang
>>
>> On Fri, Mar 24, 2017 at 12:50 AM, Nirav Patel <npa...@xactlycorp.com>
>> wrote:
>>
>>> Thanks for logging those improvements. I think decision about replacing
>>> Hbase or using any other nosql datastore for storing cubes would be based
>>> on many factors but one important I can think of is the query
>>> engine/optimizer of all of those datasources. I think taking advantage of
>>> underlaying datasource capabilities (predicate pushdown, projection etc) is
>>> important to improve query performance.
>>>
>>> Cheers,
>>> Nirav
>>>
>>> On Mon, Mar 20, 2017 at 12:23 PM, Li Yang <liy...@apache.org> wrote:
>>>
>>>> Hi Nirav,
>>>>
>>>> Glad to see you on the mailing list!!
>>>>
>>>> Yes, this is a great idea and it is on the roadmap. (This reminds me, I
>>>> should update the roadmap on kylin website soon.)
>>>>
>>>> However there are many moving parts that affect how we approach it. E.g.
>>>>
>>>> - If coprocessor is retired, do we still need HBase?
>>>> - If HBase is retired, what is the alternative storage? How about
>>>> metadata?
>>>> - There are other ways to integrate SparkSQL (KYLIN-2515), how do they
>>>> fit in...
>>>>
>>>> There are many work in this direction, I would say.
>>>>
>>>> Cheers
>>>> Yang
>>>>
>>>> On Tue, Mar 21, 2017 at 2:05 AM, Nirav Patel <npa...@xactlycorp.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In recent strata conference I raised a question if kylin can support
>>>>> sparkSQL as a query engine or have a kylin query resultset converted into
>>>>> spark DataSet(DataFrame) on which user can perform further distributed
>>>>> computation.
>>>>> Reason are
>>>>> 1) some flavor of Hbase doesnt support co-processor
>>>>> 2) SparkSql UDF  much easier to develop then hbase coprocessor
>>>>> 3) User can write their own spark UDF and run any custom aggregation
>>>>>
>>>>> Is this on roadmap ?
>>>>>
>>>>> Thanks,
>>>>> Nirav
>>>>>
>>>>>
>>>>>
>>>>> [image: What's New with Xactly]
>>>>> <http://www.xactlycorp.com/email-click/>
>>>>>
>>>>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>>>>> <https://www.linkedin.com/company/xactly-corporation>  [image:
>>>>> Twitter] <https://twitter.com/Xactly>  [image: Facebook]
>>>>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>>>>> <http://www.youtube.com/xactlycorporation>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>>
>>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>>> <https://twitter.com/Xactly>  [image: Facebook]
>>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>>> <http://www.youtube.com/xactlycorporation>
>>>
>>
>>
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>


Apache Kylin 2.0.0 beta is ready for download

2017-02-25 Thread Li Yang
Dear all,

Glad to let you know that the Apache Kylin 2.0.0 beta
 is ready for download and test. This is
not an official release, but a beta build aims for preview and collecting
feedback from the community. We want to ensure the best release quality as
always.

This 2.0.0 beta build features Spark cubing and runs TPC-H benchmark. Also
many other long wanted improvements are included. Read more:

Apache Kylin 2.0.0 beta announcement
  (中文版
)

You are very welcome to give the 2.0.0 beta a try, and please do send
feedbacks to d...@kylin.apache.org.


Cheers
Yang


  1   2   >