from:"ShaoFeng Shi"

Re: ACID with Hive/Kylin

2023-12-11 Thread ShaoFeng Shi

Hi Nam,

As Kylin is used to store the aggregated data, there should be no PII
information. (if you use Kylin to manage person level data, that is not a
good case).

If you do need to delete certain personal data, refresh the whole index or
some partitions is what we can do.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Nam Đỗ Duy  于2023年12月12日周二 12:11写道：

> Dear Xiaoxiang, Sirs/Madams
>
> I face an issue with deleting data of user according to GPDR-like policy
> which means when user send request to delete their personal data, we need
> to delete it from all system, that means to delete data:
>
> 1- from Kylin index (cube)
> 2- from Hive
> 3- from HDFS
>
> Have you had the same use-case before, do you have any suggestions to
> achieve this scenario?
>
> Thank you very much and best regards
>

Re: 如何设置树形层级结构维度

2023-10-25 Thread ShaoFeng Shi

It is not supported I think.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




雨后初晴 <745579...@qq.com> 于2023年10月25日周三 00:49写道：

>
> 有个“组织”维度，其层级是不定的。结构类似于：orgId,parentOrgId,orgName,isLeaf。分别表示组织id、上级组织id、组织名称、是否末级组织，一个组织只有一个直接上级组织。
> 事实表中只有末级组织id的数据，但查询需要任意一层级组织的数据。kylin如何定义设置这类维度的？
>

Re: Problems encountered during the use of kylin 4.0.3

2023-07-21 Thread ShaoFeng Shi

1. Did you configure some BI tool or monitoring with Kylin, which send the
"select 1" query every second? Kylin itself won't do that .

2. If you deleted the test0718 project, any request to that project will
get the "Cannot find project" error. This might be the same issue as the
first issue.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




杨冠军 <15563907...@163.com> 于2023年7月21日周五 14:12写道：

> Hello
>
> Recently, I encountered the following two issues in using Kylin 4.0.3,
> which are difficult to solve and require your Q:
>
> 1. A test topic test0718 was created in Kylin, and data was constructed on
> this topic and queried. A log of - [QUERY] - will be printed every second
> in Kylin.log, with SQL in query being select 1; Execute once per second;
>
> 2. After deleting the test topic test0718 in kylin, an error is reported
> every ten minutes in kylin.log. The error is: controller. BasicController:
> 98:
>
> Org. apache. kylin. test. exception. BadRequestException: Cannot find
> project 'test0718';
>
> May I ask how the above two situations were triggered? Thank you~
>

Re: Cannot sync Hive partitioned table,Cannot get Hive TableMeta,i want to ask if can solve this Bug in Kylin 3.1.3 hadoop version?

2023-07-18 Thread ShaoFeng Shi

Replied in JIRA, also copy here:

If you can provide the error log in kylin's backend, that would help. I
think Hive 3.1 might be too new, because Kylin 3.0 is compiled with Hive
1.1; Maybe you can try to build with your Hive version:
https://github.com/apache/kylin/blob/kylin-3.0.1/pom.xml

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Zhaoyi Huang (NSB)  于2023年6月11日周日 10:36写道：

> Hi,
>
> I want to ask if can solve this Bug in Kylin 3.1.3 hadoop version?
>
>
>
> [KYLIN-4883] Cannot sync Hive partitioned table - Ooops.. Cannot get Hive
> TableMeta - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/KYLIN-4883>
>
>
>
>- *Environment:*
>
> Redhat 7.4
> hadoop 3.1.1
> hbase 2.2.6
> hive 3.1.1
> kylin 3.1.3
>
> Kafka 2.0
>
>
>
>
>

Re: Kylin 4.2 - cleanup not working

2023-05-23 Thread ShaoFeng Shi

Hello,

It seems that message should be a warning, instead of error:
https://github.com/apache/kylin/blob/main/server-base/src/main/java/org/apache/kylin/rest/job/StorageCleanupJob.java#L230

For each cube, it checks whether the path
"/parquet/" exists or not. If exists, it will
further check its subfolders (corresponding to each segment). If not, it
just gives a warning.

Hope it helps.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Singh Sonu  于2023年5月15日周一 14:14写道：

> Hi Experts,
>
> Any help or suggestions will be appreciated.
>
> I am facing an issue while cleaning up unused parquet files in HDFS.
> command: bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true
>
> Error: job.StorageCleanup Job:222: Cube path doesn't exist! The path is
> file:/apps/kylin/kylin_metadata/project_a/parquet/cube1
>
> During full cube build or incremental, kylin is not removing unwanted or
> unused segments from parquet folder under hdfs.
>
>
>
>  You can reach me out at
>  Email- sonusingh.javat...@gmail.com
>
>  with regards,
>  Sonu Kumar Singh
>

Re: MDX interface for EXCEL from Docker image gets login failed

2023-04-19 Thread ShaoFeng Shi

Good to know you solved it by checking the document :-)

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Łukasz Stefański SoSimple  于2023年4月13日周四
01:29写道：

> Problem solved with tutorial:
> https://kylin.apache.org/docs/tutorial/quick_start_for_mdx.html
>
>
>
>
>
> *From:* Łukasz Stefański SoSimple 
> *Sent:* Wednesday, April 12, 2023 4:07 PM
> *To:* user@kylin.apache.org
> *Subject:* MDX interface for EXCEL from Docker image gets login failed
>
>
>
> Hi
>
> I am using docker kylin for now. But after buidling the cube I discovered
> that it is not possible to connect to CUBE from EXCEL or POWERBI.
>
> I am getting login failed for user ADMIN/KYLIN.
>
>
>
> Bellow logs shows login try from excel :
>
>
>
> File mdx.log :
>
>
>
> 2023-04-07 04:02:59,273 [WARN ] [Query
> f3f4b028-1bde-4c2d-e1a6-e1d0d219d3c1] i.k.m.i.s.f.MdxServiceFilter.doFilter
> - [MDX-04010001] please add auth info in your request.
>
> 2023-04-07 04:02:59,331 [INFO ] [Query
> 5202bd61-6de4-5925-12c9-c43ccb98a3a6]
> i.k.m.w.x.MdxXmlaServlet.prepareMondrianSchema - begin init datasource,
> username=ANALYST, project=learn_kylin
>
> 2023-04-07 04:02:59,331 [ERROR] [Query
> 5202bd61-6de4-5925-12c9-c43ccb98a3a6] i.k.m.i.s.f.MdxServiceFilter.doFilter
> - internal error
>
> io.kylin.mdx.insight.common.SemanticException: The connection user
> information or password maybe empty or has been changed, please contact
> system admin to update in Configuration page under Management.
>
> at
> io.kylin.mdx.insight.core.support.SemanticFacade.getSemanticProjectByUser(SemanticFacade.java:62)
> ~[semantic-core-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.core.service.ModelManager.buildMondrianSchemaFromDataSet(ModelManager.java:70)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.XmlaDatasource.loadMdnSchemas(XmlaDatasource.java:108)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.XmlaDatasource.initDatasource(XmlaDatasource.java:90)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.MdxXmlaServlet.prepareMondrianSchema(MdxXmlaServlet.java:216)
> ~[mdx-1.2.0.jar!/:?]
>
> at
> io.kylin.mdx.web.xmla.MdxXmlaServlet.process(MdxXmlaServlet.java:103)
> ~[mdx-1.2.0.jar!/:?]
>
> at mondrian.xmla.XmlaServlet.doPost(XmlaServlet.java:119)
> ~[olap4j-xmlaserver-1.2.0.jar!/:?]
>
>
>
> Can you please help to fix this docker setup ?
>
>
>
> Łukasz
>

Re: Question about setup Kylin4.0 in CDH

2023-03-23 Thread ShaoFeng Shi

Hi Lea,

Yes we have user run Kylin 4 on CDP 7, but I'm not sure whether it is
exactly the same version as yours. As I remember there is some jar conflict
(servlet or jsp related) which need some manual work, if you can provide
the detail log message you got, that would be great.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Chu, Lea  于2023年3月8日周三 15:42写道：

> Hi all users in Kylin,
>
>
>
> I’m Lea from Taiwan Garmin. I’m beginner in Kylin and would like to setup
> Kylin in our Hadoop cluster.
>
> I saw Kylin version 4.0 passes the tests on Cloudera CDH 6.3.2 in
> Installation Guide. (https://kylin.apache.org/docs/install/index.html)
> Unfortunately, our Hadoop cluster is built on CDH7.1.7. So I would like to
> know if other users have set up Kylin4.0 and test the availability of CDH
> 7.X. Thank you
>
>
>
> Regards,
>
> Lea
>

Re: Kylin 特性

2023-03-23 Thread ShaoFeng Shi

Hello Renjie,

I didn't get the question clearly; If you can provide some detail
information such as a sample, that would be helpful for other people to
answer.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




王仁杰  于2023年2月13日周一 16:48写道：

> 想问一下，Kylin 目前有提供父子维递归查询的解决方案吗
>

Re: Kylin Compatibility issue

2023-02-20 Thread ShaoFeng Shi

Hello Ibar,

I just replied it in the JIRA. For such problem it is better to discuss it
first in the mailing list than in JIRA, because JIRA is mainly for
feature/bug/task management. It shows that you didn't subscribe the mailing
list so your email was blocked. I manually approved that. To proceed,
please finishe the subscribing.

About how to subscribe, please check this as an example (replace "inlong"
with the name of the project you want to subscribe):
https://inlong.apache.org/community/how-to-subscribe/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Ibrar Ahmed  于2023年2月21日周二 10:34写道：

> Hi Community,
> please have a look at the following JIRA ticket:
> https://issues.apache.org/jira/browse/KYLIN-5453.
> and update on the ticket.
>
> Regards:
> Ibrar Ahmed
> --
>
> Thanks!!
>
>
>
> Ibrar Ahmed | Staff. Data Engineer
>
> *10Pearls*
>
> Digital Innovation & Acceleration Partner
>
> www.10pearls.com
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.10pearls.com_=DwMFAg=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=jFWw2L_g3h_qJJ-yyJa08qXKXrjXr2SsE9I5tgGbpm0=84_nBJclJ2_UsFE2Qbo2LEl7C9t_wyH8UH9QK78ZB_8=7yFR9TI8yv2LXFzheIMiSbyzbB6w4mTZaf1RSfcnH4k=
> >
>
>
> Wash DC | San Fran | London | Karachi | Dubai | Medellin
>
> *EY Entrepreneur of the Year Finalist (CEO)*
>
> *Inc. 5000*
>

[Announce] Apache Kylin 4.0.3 released

2022-12-22 Thread ShaoFeng Shi

The Apache Kylin team is pleased to announce the immediate availability of
the 4.0.3 release.

This is a bug-fix release after 4.0.2, with 4 new features/improvements and
4 bug fixes; All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/download/

Apache Kylin is an open source Distributed Analytics Engine designed to
provide SQL interface and multi-dimensional analysis (OLAP) on Apache
Hadoop, supporting extremely large datasets.

Apache Kylin lets you query massive dataset at sub-second latency in 3
steps:
1. Identify a star schema or snowflake schema data set on Hadoop.
2. Build Cube on Hadoop.
3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
or RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kylin.apache.org/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: KYLIN自定义聚合函数

2022-11-24 Thread ShaoFeng Shi

Hi, it need some development. You can refer to this folder for measure
aggregator:
https://github.com/apache/kylin/tree/main/core-metadata/src/main/java/org/apache/kylin/measure

To make it appear in the web page, also need to modify the front-end codes.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




朱杨坤  于2022年11月21日周一 10:25写道：

> 请问：KYLIN 3.X版本怎么添加自定义聚合函数  ， 并出现在度量 的选择项中？
>

Re: [DISCUSS] Move to Spark 3 totally in Kylin 4

2022-08-22 Thread ShaoFeng Shi

Thanks for yang's comment. We will move to Spark 3 from next release, which
will be Kylin 4.0.2.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yang Li  于2022年8月11日周四 17:36写道：

> +1
>
> Spark 2 is out of maintenance. Due to security concerns, we should
> encourage all Kylin users to move away from Spark 2 and onboard Spark 3 for
> data safety.
>
> The best signal for this purpose is stopping releases that are known to
> contain security vulnerabilities.
>
> Regards
> Yang
>
> From: ShaoFeng Shi mailto:shaofeng...@apache.org>>
> Sent: Wednesday, August 10, 2022 10:37 AM
> To: dev mailto:d...@kylin.apache.org>>; user <
> user@kylin.apache.org<mailto:user@kylin.apache.org>>
> Subject: [DISCUSS] Move to Spark 3 totally in Kylin 4
>
> Hello Kylin community,
>
> As you know, Kylin 4.0 supports both Spark 2.4 and Spark 3.1 at the very
> begining; Recently when we try to fix some security vulnerabilities (e.g,
> CVE-2022-22978<https://github.com/advisories/GHSA-hh32-7344-cg2f>), we
> found that Spark 2 is hard to be compitable with recommended version of
> Spring-core and Spring security.
>
> Besides, we noticed that the latest Spark 2 release v2.4.8 was released on
> May 17, 2021, which is almost 15 months ago. Which means it is not actively
> maintained anymore.
>
> So,  I propose Kylin 4 move to Spark 3 totally, and will not release
> package for Spark 2 anymore. For the legacy users, please upgrade your
> Spark.
>
> Your comments are welcomed.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC,
> Apache Incubator PMC,
> Email: shaofeng...@apache.org<mailto:shaofeng...@apache.org>
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org user-subscr...@kylin.apache.org>
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org dev-subscr...@kylin.apache.org>
>
>
>

[DISCUSS] Move to Spark 3 totally in Kylin 4

2022-08-09 Thread ShaoFeng Shi

Hello Kylin community,

As you know, Kylin 4.0 supports both Spark 2.4 and Spark 3.1 at the very
begining; Recently when we try to fix some security vulnerabilities (e.g,
CVE-2022-22978 <https://github.com/advisories/GHSA-hh32-7344-cg2f>), we
found that Spark 2 is hard to be compitable with recommended version of
Spring-core and Spring security.

Besides, we noticed that the latest Spark 2 release v2.4.8 was released on
May 17, 2021, which is almost 15 months ago. Which means it is not actively
maintained anymore.

So,  I propose Kylin 4 move to Spark 3 totally, and will not release
package for Spark 2 anymore. For the legacy users, please upgrade your
Spark.

Your comments are welcomed.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: presto连接KYLIN

2022-06-10 Thread ShaoFeng Shi

Hi yangkun,

I'm curious about your scenario; are you trying to use Kylin as a source in
Presto, or use Presto as a data source in Kylin (just like Hive)?

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Mukvin  于2022年6月10日周五 14:30写道：

> Hi,
> Current Kylin doesn't have the presto connector
>
>
> --
> Best regards.
> Tengting Xu
>
>
> 在 2022-06-09 14:36:24，"朱杨坤"  写道：
>
> 请问有presto连接kylin的驱动包吗？或者开发demo?
>
>

[REPORT] Apache Kylin - May 2022

2022-05-10 Thread ShaoFeng Shi

## Description:
The mission of Apache Kylin is the creation and maintenance of
software-related to a distributed and scalable OLAP engine

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Kylin was founded on 2015-11-18 (6 years ago)
There are currently 47 committers and 24 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. The last addition was Xiaoxiang Yu on 2020-10-08.
- No new committers. The last addition was Shengjun Zheng on 2021-07-07.

## Project Activity:
4.0.1 and 3.1.3 were released on 2022-01-05. The next release should be
coming in June.

## Community Health:
Overall community health is good but development activity is decreasing.
The good news is that more Kylin users are willing to upgrade from Kylin
2/3 to
the latest 4.0 version, we are collecting feedback from users and doing our
best to fix bugs and provide enhancement to make Kylin 4 ready for
production.

23 issues opened in JIRA, past quarter (-41% change)
7 issues closed in JIRA, past quarter (-22% change)
50 commits in the past quarter (-43% change)
9 code contributors in the past quarter (28% increase)
51 PRs opened on GitHub, past quarter (no change)
42 PRs closed on GitHub, past quarter (-4% change)

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[REPORT] Apache Kylin - February 2022

2022-02-08 Thread ShaoFeng Shi

## Description:
The mission of Apache Kylin is the creation and maintenance of
software related to a distributed and scalable OLAP engine.

## Issues:
No issue needs the board's attention.

## Membership Data:
We need invite more developers into our community.

Apache Kylin was founded 2015-11-18 (6 years ago)
There are currently 47 committers and 24 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Xiaoxiang Yu on 2020-10-08.
- No new committers. Last addition was Shengjun Zheng on 2021-07-07.


## Project Activity:
By the end of Jan 2022, Kylin Community released two minor versions
3.1.3 and 4.0.1, which fixed six reported security issues.

3.1.3 was released on 2022-01-05.
4.0.1 was released on 2022-01-05.
4.0.0 was released on 2021-08-31.

## Community Health:
Since Nov 2021, Kylin Community has been designing and developing some
features including Kylin 4 on AWS, new metadata definition,
new semantic layer which supports connecting Kylin via MDX.

d...@kylin.apache.org had a 42% decrease in traffic in the past quarter (101
emails compared to 173)
iss...@kylin.apache.org had a 31% decrease in traffic in the past quarter
(790 emails compared to 1130)
41 issues opened in JIRA,past quarter (-31% change)
37 issues closed in JIRA, past quarter (19%increase)
88 commits in the past quarter (-37% change)
38 code contributors in the past quarter (245% increase)
46 PRs opened on GitHub, past quarter (4% increase)
42 PRs closed on GitHub, past quarter (-12% change)

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

How to unsubscribe apache mailing list / 如何退订邮件列表

2022-01-11 Thread ShaoFeng Shi

Hello,

Sometimes people send an email titled "unsubscribe" or "退订" to the mailing
list. Please note that it won't help to unsubscribe you from the mailing
list. No people will handle that request.

If you do want to unsubscribe, please drop an empty email from your email
address to -unsubscribe@.apache.org (
user-unsubscr...@apache.org or dev-unsubscr...@apache.org in Kylin's case);
The apache mailing list robot will reply to confirm with you, you need to
read that email and reply it to confirm. Only on the confirmation, you will
be removed from the mailing list.

You can also find the address in the project page (mouse over the
"unsubscribe" link):
https://kylin.apache.org/community/


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: [DISCUSS] The future of Apache Kylin

2022-01-11 Thread ShaoFeng Shi

+1

Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL
is the main query language, which makes it is a little confusing for users
to differentiate it from other technologies. Introducing the new semantic
layer will make Kylin a more complete solution.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2022年1月11日周二 16:07写道：

> Cool!
> Looking forward to the new features of the next generation Apache Kylin.
>
> 在 2022年1月11日，下午2:30，Xiaoxiang Yu  写道：
>
> Thanks Yang, there are two new features that I really looking forward to,
> and they are:
>
> 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and
> more BI tools.
> 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as
> add/delete dimensions/measures) which status is Ready without purge the any
> useful cuboid/segmemnt .
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2022-01-11 13:59:13, "Li Yang"  wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult

Re: [Kylin Security Notice] Impact analysis of Apache Log4j2 Remote Code Execution Vulnerability

2021-12-10 Thread ShaoFeng Shi

Yaqian, thank you for the information!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2021年12月10日周五 18:58写道：

> Hi all:
>
> This is a security notice about the impact analysis of Apache Log4j2
> Remote Code Execution Vulnerability on Apache Kylin.
> Background
>
> Apache Log4j2 is a Java based logging tool, which is widely used in the
> industry. The recently discovered Remote Code Execution Vulnerability of
> Apache Log4j2 makes it possible for the program that introduces Apache
> Log4j2 to be triggered Remote Code Execution by an attacker who construct a
> special request.
> Scope of influence
>
> The version range of Log4j2 with security vulnerabilities is: Apache Log4j
> 2.x <= 2.14.1.
> The currently released versions of Apache Kylin (Kylin 2.x, Kylin 3.x,
> Kylin 4.x) use log4j version 1.2.17 by default. However, considering that
> kylin's startup script will load jars from Hadoop environment, including
> Hadoop, Spark, HBase, Hive and other components, the log4j version used in
> Hadoop3 environment is generally Apache Log4j2, so if your Hadoop is above
> version 3.0, it is recommended to upgrade the Log4j2  of Hadoop cluster, to
> avoid the possibility of polluting kylin services.
> Solution
>
> If the Hadoop component used by kylin user's environment uses Log4j2, the
> user needs to comprehensively upgrade Log4j2 to the latest 2.15.0-rc2 to
> prevent Kylin from loading the jar of Log4j2 with security risks into
> Kylin's classpath through scripts.
> After the log4j2 environment is fully upgraded, users can execute jinfo
> `cat pid` under $KYLIN_HOME to check whether the jar packages such as
> log4j-core-2.x.x.jar introduced by Kylin's classpath are the latest secure
> Log4j2 versions.
>
>
> Best Regards!
>
> Apache Kylin Team

Re: Kylin Cube Build Error

2021-10-18 Thread ShaoFeng Shi

There should be more warning message in the log about which rule it breaks
in HBase. You can double check the log file, or disable the sanity check by
modifying the hbase configurations.

https://github.com/apache/hbase/blob/rel/1.1.13/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yunhui Han  于2021年10月19日周二 上午7:05写道：

> hi, all
>
> Our Kylin recently throw exceptions like this. However, our Kylin has run
> normally for more than 5 years. Could anyone help me with this?
>
> Sincerely
>
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> /tmp/hbase-hbase/local/jars/tmp/.806967ef-2bcb-41a8-b67a-72d9836e7c3a.kylin-coprocessor-2.3.1-0.jar.1634555303810.jar
>  (Read-only file system) Set hbase.table.sanity.checks to false at conf or 
> table descriptor if you want to bypass sanity checks
>   at 
> org.apache.hadoop.hbase.master.HMaster.warnOrThrowExceptionForFailure(HMaster.java:1814)
>   at 
> org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:1682)
>   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1601)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:462)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:57204)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>   at java.lang.Thread.run(Thread.java:745)
>
>   at sun.reflect.GeneratedConstructorAccessor303.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:226)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:240)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:140)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4381)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:742)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:663)
>   at 
> org.apache.kylin.storage.hbase.steps.CubeHTableUtil.createHTable(CubeHTableUtil.java:105)
>   at 
> org.apache.kylin.storage.hbase.steps.CreateHTableJob.run(CreateHTableJob.java:111)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:97)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>  org.apache.hadoop.hbase.DoNotRetryIOException: 
> /tmp/hbase-hbase/local/jars/tmp/.806967ef-2bcb-41a8-b67a-72d9836e7c3a.kylin-coprocessor-2.3.1-0.jar.1634555303810.jar
>  (Read-only file system) Set hbase.table.sanity.checks to false at conf or 
> table descriptor if you want to bypass sanity checks
>   at 
> org.apache.hadoop.hbase.master.HMaster.warnOrThrowExceptionForFailure(HMaster.java:1814)
>   at 
> org.apache.hadoop.hbase.master

Re: MERGE CUBE job always fails

2021-06-02 Thread ShaoFeng Shi

Hi Michael,

Thanks for your information.

Firstly, I want to make a clarification on the path "
hdfs://x:8020/kylin/kylin_metadata/kylin-d0a4b4b5-
44ba-cdf3-c6d0-231483835b24/x/cuboid". Although in the path it has the "
kylin_metadata" prefix, it is not just for "metadata", but also for data,
especially the cuboid data. We put "kylin_metadata" in the path because it
represents this Kylin instance. So that if you have another Kylin instance
e.g, "kylin_metadata_qa", the data will be in another path.

Secondly, usually if seeing a "Path not exist" error, and do confirm the
path not there, it may be caused by the stop/start of the EMR cluster, as
EMR HDFS data will be lost during a restart.

Putting all data on S3 doesn't have that problem, but the build performance
might be slower than in local HDFS. It depends on how much data you have in
the cluster, we can have different approaches to optimize it.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org



Michael, Gabe  于2021年4月30日周五 下午11:49写道：

> I think I have solved my problem.
>
>
>
> I misunderstood the purpose of the kylin.env.hdfs-working-dir property.
>
> In the documentation on Kylin on EMR (
> http://kylin.apache.org/docs/install/kylin_aws_emr.html) it says:
>
>
>
> Kylin’s ‘hdfs-working-dir’ is for putting the intermediate
> data for Cube building, cuboid files and also some metadata files (like
> dictionary and table snapshots which are not good in HBase);
>
> so it is best to configure HDFS for this.
>
> If using HDFS as Kylin working directory, you just leave
> configurations unchanged as EMR’s default FS is HDFS:
>
> kylin.env.hdfs-working-dir=/kylin
>
> Before you shutdown/restart the cluster, you must backup
> the “/kylin” data on HDFS to S3 with S3DistCp, or you may lost data and
> couldn’t recover the cluster later.
>
>
>
> Use S3 as kylin.env.hdfs-working-dir
>
>
>
> If you want to use S3 as storage (assume HBase is also on
> S3), you need configure the following parameters:
>
>
>
>
> kylin.env.hdfs-working-dir=s3://yourbucket/kylin
>
>
> kylin.storage.hbase.cluster-fs=s3://yourbucket
>
>
> kylin.source.hive.redistribute-flat-table=false
>
>
>
> The intermediate file and the HFile will all be written to
> S3.
>
>
>
> I misunderstood the documentation and assumed that when kylin.metadata.url
>
> was configured to point to a MySQL database, all Kylin metadata would be
>
> written to MySQL.
>
>
>
> But now I understand some Kylin metadata is always written to HDFS or S3
>
> regardless of whether kylin.metadata.url points at MySQL or Hbase.
>
> Because my kylin.env.hdfs-working-dir was pointing to an HDFS location
> that
>
> did not persist across EMR clusters, the cuboid metadata was missing and
>
> the MERGE CUBE job was failing.
>
>
>
> I changed kylin.env.hdfs-working-dir to point to an S3 location,
>
> purged my cube, built two segments, and successfully ran a MERGE CUBE job.
>
>
>
> Thank you,
>
>
>
> Gabe
>
>
>
> *De : *Michael, Gabe 
> *Date : *jeudi, 29 avril 2021 à 12:32
> *À : *user@kylin.apache.org 
> *Objet : *MERGE CUBE job always fails
>
> Hello,
>
>
>
> I am running Kylin 3.1.1 on AWS EMR 5.30.1
>
> (Hadoop 2.8.5, Hive 2.3.6, HBase 1.4.13, ZooKeeper 3.4.14).
>
>
>
> Hbase is configured to store data on S3, and I use AWS Aurora MySQL
>
> for Kylin metadata.
>
>
>
> Whenever I attempt to run a MERGE CUBE job, the job fails at #4 Step
>
> Name: Merge Cuboid Data
>
>
>
> Step Parameters:
>
>
>
> -conf /usr/local/kylin/conf/kylin_job_conf.xml -cubename
> x -segmentid f6eadc72-e5e4-bdd2-db1d-24ea19fbf9c4 -input
> hdfs://x:8020/kylin/kylin_metadata/kylin-a13671fd-ef51-fa05-89d1-033f0c6e3423/x/cuboid/*,hdfs://x:8020/kylin/kylin_metadata/kylin-24ef5daf-38e2-56d3-2b0a-792ffb37a0bf/x/cuboid/*,hdfs://x:8020/kylin/kylin_met
> adata/kylin-d0a4b4b5-44ba-cdf3-c6d0-231483835b24/x/cuboid/* -output
> hdfs://x:8020/kylin/kylin_metadata/kylin-6c86522d-2a6e-2820-1af6-193f7c19b0d0/x/cuboid/
> -jobname Kylin_Merge_Cuboid_x_Step
>
> Step Logs:
>
>
>
> java.io.IOException: No input paths specified in job
>
>at
> org.apache.hado

Re: cube build failing in step 3 -memory heap issue

2021-01-13 Thread ShaoFeng Shi

Hi,

You can check the "Extract Fact Table Distinct Columns" section in
https://kylin.apache.org/docs/howto/howto_optimize_build.html

Usually it may be caused by: 1) cube may have too many dimensions; 2) there
is ultra high cardinality column in the dimension list (e.g, a UUID column,
timestamp column, etc); 3) hadoop map/reduce memory configuration is small.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Ahmad Hammad  于2021年1月14日周四 上午11:22写道：

> Dear ,
>
> hope all is well,
>
> we are looking to use Apache Kylin instead of SSAS for our business
> analysis -dashboard product . we are facing a problem in building the cube
> , it contains two hive tables one fact table and one dimension table .
>
> fact table total number of rows is 47271784  and total size is 5326550430
> as shown in show tblproperties query in hive cmd .
>
> and dimision tble totoal number of rows is 5261766 and total size is
> 1174440814 as shown in show tblproperties query in hive cmd.
>
>
>
>
> the build process failed in step 3 //
>  #3 Step Name: Extract Fact Table Distinct Columns
> Data Size: 16.19 KB
> Duration: 11.78 mins Waiting: 13 seconds
>
>
> the logs give Java heap space Error as follow :
>
> org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 55
> File System Counters
> FILE: Number of bytes read=323698
> FILE: Number of bytes written=29783830
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=252673677
> HDFS: Number of bytes written=16576
> HDFS: Number of read operations=195
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=3
> Job Counters
> Failed reduce tasks=4
> Launched map tasks=47
> Launched reduce tasks=5
> Data-local map tasks=47
> Total time spent by all maps in occupied slots (ms)=4363352
> Total time spent by all reduces in occupied slots (ms)=2032100
> Total time spent by all map tasks (ms)=1090838
> Total time spent by all reduce tasks (ms)=508025
> Total vcore-milliseconds taken by all map tasks=1090838
> Total vcore-milliseconds taken by all reduce tasks=508025
> Total megabyte-milliseconds taken by all map tasks=1117018112
> Total megabyte-milliseconds taken by all reduce tasks=520217600
> Map-Reduce Framework
> Map input records=47271784
> Map output records=5261813
> Map output bytes=57539075
> Map output materialized bytes=15536194
> Input split bytes=138932
> Combine input records=5261813
> Combine output records=5261813
> Reduce input groups=1
> Reduce shuffle bytes=340412
> Reduce input records=47
> Reduce output records=0
> Spilled Records=5261860
> Shuffled Maps =47
> Failed Shuffles=0
> Merged Map outputs=47
> GC time elapsed (ms)=68095
> CPU time spent (ms)=1246430
> Physical memory (bytes) snapshot=44485660672
> Virtual memory (bytes) snapshot=137661587456
> Total committed heap usage (bytes)=41749577728
> Peak Map Physical memory (bytes)=960831488
> Peak Map Virtual memory (bytes)=2891886592
> Peak Reduce Physical memory (bytes)=305377280
> Peak Reduce Virtual memory (bytes)=2667810816
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=0
> File Output Format Counters
> Bytes Written=0
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
> BYTES=1563833108
> Job Diagnostics:Task failed task_1610370996803_0012_r_00
> Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0
> killedReduces: 0
>
> Failure task Diagnostics:
> Error: Java heap space
>
> at org.apache.kylin.engine.mr
> .common.MapReduceExecutable.doWork(MapReduceExecutable.java:234)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> i tried to increase the memory located to Kylin to 17 gb in the setenv.sh
> file as recommended
>
>  a

Re:

2020-12-17 Thread ShaoFeng Shi

Please send an email to user-unsubscr...@kylin.apache.org, and then it will
send an email to you for confirmation, then reply that to confirm...

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




再见了～ <441157...@qq.com> 于2020年11月28日周六 下午11:11写道：

> 辛苦帮忙退订，谢谢
> 邮箱: [hidden email]
> <http://apache-kylin.74782.x6.nabble.com/user/SendEmail.jtp?type=node=12756=0>
>

Re: Data refresh

2020-11-05 Thread ShaoFeng Shi

Hi Chaymaa, you're correct, I think the resource is limited, you need to
add more resources to improve the build performance.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




chaymaa lyoubiidrissi  于2020年11月5日周四
下午3:45写道：

> Hi,
> My name is Lyoubi Idrissi Chaymaa, i want to thank you first for this
> opportunity.
> I am using Apache Kylin 3.1.0 to generate my first OLAP Cube (Map reduce
> engine), it tooks me about 25 minutes to build the cube with only 1M rows
> of data, i tried to rebuild it in specific intervals using crontab and it
> tooks almost the same time. is this due to my machine performance or the
> lack of nodes, can you please help me understand this issue ?
>
> I am using Hadoop on a single node with 8 GB of RAM, and Intel(R) Xeon(R)
> CPU D-1541 @ 2.10GHz as processor
>
> Thanks in advance.
>
>

Re: stay at 20 Step Name: Load HFile to HBase Table

2020-10-26 Thread ShaoFeng Shi

That's good. Thanks for the update.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




li_cong521  于2020年10月26日周一 下午3:40写道：

> hello
> the error has been solved.
> function: set the hbase-site.xml  hbase.rootdir=hdfs://master2/hbase
> thanks~
>
>
>
>
>
>
>
> At 2020-10-26 14:45:18, "li_cong521"  wrote:
>
> hello:
>
> anbody miss this error?  the cube build stay at 20 step,
> the value i set in kylin.properties
> kylin.storage.hbase.cluster-fs=hdfs://mycluster/hbase
> the hadoop values  fs.defaultFS is hdfs://master2
> the log follows:
> org.apache.kylin.engine.mr.exception.HadoopShellException:
> java.io.IOException: BulkLoad encountered an unrecoverable problem
>  at
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:534)
>  at
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:465)
>  at
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:343)
>  at
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1069)
>  at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>  at
> org.apache.kylin.storage.hbase.steps.BulkLoadJob.run(BulkLoadJob.java:102)
>  at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>  at
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>  at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>  at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>  at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=35, exceptions:
> Mon Oct 26 11:15:55 CST 2020,
> RpcRetryingCaller{globalStartTime=1603682155923, pause=100, retries=35},
> java.io.IOException: java.io.IOException: Wrong FS:
> hdfs://Master2/kylin/kylin_metadata/kylin-39e914b5-b9f5-3d14-83e8-45da3eb54657/kylin_sales_cube/hfile/F2/4be8dd587c7b4ddebac0d4c30eeaf260,
> expected: hdfs://mycluster
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2239)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>  at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://Master2/kylin/kylin_metadata/kylin-39e914b5-b9f5-3d14-83e8-45da3eb54657/kylin_sales_cube/hfile/F2/4be8dd587c7b4ddebac0d4c30eeaf260,
> expected: hdfs://mycluster
>  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:184)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1068)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
>  at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
>  at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
>  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
>  at
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:387)
>  at
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.bulkLoadStoreFile(HRegionFileSystem.java:466)
>  at
> org.apache.hadoop.hbase.regionserver.HStore.bulkLoadHFile(HStore.java:780)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5404)
>  at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:1970)
>  at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33650)
>  at org.apache.hadoop.hbase.ipc

Re: [Announce] Apache Kylin 3.1.1 released

2020-10-18 Thread ShaoFeng Shi

This is great! Thank you Xiaoxiang!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Xiaoxiang Yu  于2020年10月18日周日 下午9:18写道：

> The Apache Kylin team is pleased to announce the immediate availability of
>
> the 3.1.1 release.
>
>
> This is a bugfix release after 3.1.0, with 21 bug fixes and 37
> enhancements.
>
> All of the changes in this release can be found in:
>
> https://kylin.apache.org/docs/release_notes.html
>
>
> You can download the source release and binary packages from Apache Kylin's
>
> download page: https://kylin.apache.org/download/
>
>
> Apache Kylin is an open-source Distributed Analytical Data Warehouse for
>
> Big Data; it was designed to provide OLAP (Online Analytical Processing)
>
> capability in the big data era. By renovating the multi-dimensional cube
>
> and precalculation technology on Hadoop and Spark, Kylin is able to achieve
>
> near-constant query speed regardless of the ever-growing data volume.
>
> Reducing query latency from minutes to sub-second, Kylin brings online
>
> analytics back to big data.
>
>
> Apache Kylin lets you query billions of rows at sub-second latency in 3
>
> steps:
>
> 1. Identify a Star/Snowflake Schema on Hadoop.
>
> 2. Build Cube from the identified tables.
>
> 3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC or
>
> RESTful API.
>
>
> Thanks to everyone who has contributed to this release.
>
>
> We welcome your help and feedback. For more information on how to report
>
> problems, and to get involved, visit the project website at
>
> https://kylin.apache.org/
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

2020-07-25 Thread ShaoFeng Shi

Hi Xiao,

The 3.x will continue to release, especially for bug fix and security
issues. For new features and enhancements, it depends. The main
consideration is the testing and release effort: now each 3.x release needs
to build and test with 4 HBase API versions; even so, many users still
encounter environment problems in the even newer Hadoop platform like CDH
6.3, CDP 7, etc. So we will slow down the 3.x release frequency, so to move
more efforts on the parquet storage. The parquet storage has much better
compatibility on different platforms.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




chuxiao  于2020年7月25日周六 下午12:12写道：

> Will 3.x continue to release？For example，support hbase rsgroup.
>
>
>
>
>
> At 2020-07-24 19:23:11, "ShaoFeng Shi"  wrote:
>
> Hello, Kylin users,
>
> Regarding the Kylin Parquet storage, we hope to update the progress here.
> At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
>
> Here we discuss the future plan, which includes the two aspects.
>
> 1. The plan for Kylin 4.0
>
> In Kylin 3.x, we have released some important functions/features, such as
> real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
>
> 2. Backward compatibility for HBase storage.
>
> When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
>
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
>
> Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
>
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

2020-07-24 Thread ShaoFeng Shi

Hi Cinto,

Currently, it uses the native Parquet, no additional indexing; in the
future, if Parquet enhances its index, Kylin can benefit from it;

== "are we using any metastore (like Hive) along with this ?"
I'm not sure whether I understand properly. The Cube parquet files are
directly persisted on HDFS or object storage, with no dependency on the
Hive meta store.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Cinto Sunny  于2020年7月24日周五 下午10:47写道：

> Is there any documentation on the additional indexing (if any) we are
> doing on parquet. Also, are we using any metastore (like Hive) along with
> this ?
>
> - Cinto
>
>
> On Fri, Jul 24, 2020 at 4:23 AM ShaoFeng Shi 
> wrote:
>
>> Hello, Kylin users,
>>
>> Regarding the Kylin Parquet storage, we hope to update the progress here.
>> At present, we have completed the main development work[1], design
>> document[2], and the benchmark. With the new architecture, Kylin is going
>> to be more efficient and be more cloud-friendly: fully on Spark, less
>> dependency on Hadoop stack, which made the DevOps easier.
>>
>> Here we discuss the future plan, which includes the two aspects.
>>
>> 1. The plan for Kylin 4.0
>>
>> In Kylin 3.x, we have released some important functions/features, such as
>> real-time analysis, Flink building engine, global dictionary with Hive,
>> etc. In the next phase, we hope to concentrate on the Parquet storage
>> engine and to release it in Kylin v4.0 within this year. In this period,
>> 3.x will be keeping maintained for bug fix and security vulnerability, but
>> won't introduce big change or major features.
>>
>> 2. Backward compatibility for HBase storage.
>>
>> When we develop the Parquet storage engine, we find it is very difficult
>> to make the Parquet and HBase engines co-exist. The codebase becomes very
>> complicated and ugly, inevitably bring big challenges to the maintenance
>> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>> CDHs' are different from the community's'), which makes the testing and
>> release effort be doubled or tripled in the past years.
>>
>> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>> metadata will also migrate to MySQL. For existing users, if you want to use
>> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>> Parquet storage, a migration tool can be provided later (another discuss
>> thread).
>>
>> Welcome to tell us your concerns and suggestions! Thank you for your
>> participation.
>>
>> ## Reference
>> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>> [2]
>> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

2020-07-24 Thread ShaoFeng Shi

Hi Kang, it will still be KV; If changing to relational, there is too much
work to do.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Zhou Kang  于2020年7月24日周五 下午10:21写道：

> I have a question:
> Metadata based on MySQL,  data in MySQL is KV or relational ?
>
> Thank you!
>
>
> > 2020年7月24日 下午7:23，ShaoFeng Shi  写道：
> >
> > Hello, Kylin users,
> >
> > Regarding the Kylin Parquet storage, we hope to update the progress
> here. At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
> >
> > Here we discuss the future plan, which includes the two aspects.
> >
> > 1. The plan for Kylin 4.0
> >
> > In Kylin 3.x, we have released some important functions/features, such
> as real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
> >
> > 2. Backward compatibility for HBase storage.
> >
> > When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
> >
> > So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
> >
> > Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
> >
> > ## Reference
> > [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> > [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
> >
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> > Apache Kylin PMC
> > Email: shaofeng...@apache.org
> >
> > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> > Join Kylin user mail group: user-subscr...@kylin.apache.org
> > Join Kylin dev mail group: dev-subscr...@kylin.apache.org
> >
> >
>
>

[DISCUSS] Kylin Parquet storage and 4.0 plan

2020-07-24 Thread ShaoFeng Shi

Hello, Kylin users,

Regarding the Kylin Parquet storage, we hope to update the progress here.
At present, we have completed the main development work[1], design
document[2], and the benchmark. With the new architecture, Kylin is going
to be more efficient and be more cloud-friendly: fully on Spark, less
dependency on Hadoop stack, which made the DevOps easier.

Here we discuss the future plan, which includes the two aspects.

1. The plan for Kylin 4.0

In Kylin 3.x, we have released some important functions/features, such as
real-time analysis, Flink building engine, global dictionary with Hive,
etc. In the next phase, we hope to concentrate on the Parquet storage
engine and to release it in Kylin v4.0 within this year. In this period,
3.x will be keeping maintained for bug fix and security vulnerability, but
won't introduce big change or major features.

2. Backward compatibility for HBase storage.

When we develop the Parquet storage engine, we find it is very difficult to
make the Parquet and HBase engines co-exist. The codebase becomes very
complicated and ugly, inevitably bring big challenges to the maintenance
and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
CDHs' are different from the community's'), which makes the testing and
release effort be doubled or tripled in the past years.

So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
metadata will also migrate to MySQL. For existing users, if you want to use
the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
Parquet storage, a migration tool can be provided later (another discuss
thread).

Welcome to tell us your concerns and suggestions! Thank you for your
participation.

## Reference
[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
[2]
https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[SECURITY][CVE-2020-13926] Apache Kylin SQL injection vulnerability

2020-07-13 Thread ShaoFeng Shi

Versions Affected: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.4.1,
2.5.0, 2.5.1, 2.5.2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.6.6,
3.0.0-alpha, 3.0.0-alpha2, 3.0.0-beta, 3.0.0, 3.0.1 3.0.2

Description:

Kylin concatenates and executes some Hive SQL statements in Hive CLI or
beeline when building new segments; some parts of the SQL are from system
configurations, while the configuration can be overwritten by certain rest
API, which make SQL injection attack is possible.

Mitigation:
Users of all previous versions after 2.0 should upgrade to 3.1.0.

Credit:
We would like to thank Rupeng Wang from Kyligence for reporting and fix
this issue.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[SECURITY][CVE-2020-13925] Apache Kylin command injection vulnerability

2020-07-13 Thread ShaoFeng Shi

Versions Affected: 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2,
2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.6.6, 3.0.0-alpha, 3.0.0-alpha2,
3.0.0-beta, 3.0.0, 3.0.1 3.0.2

Description:

Similar to CVE-2020-1956, Kylin has one more restful API which concatenates
the API inputs into OS commands and then executes them on the server; while
the reported API misses necessary input validation, which causes the
hackers have the possibility to execute OS command remotely.

Mitigation:
Users of all previous versions after 2.3 should upgrade to 3.1.0.

Credit:
We would like to thank Clancey  for reporting
this issue.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: Re: 执行cube在第4步报错，提示“打开的文件过多” 是怎么回事？

2020-07-09 Thread ShaoFeng Shi

https://askubuntu.com/questions/181215/too-many-open-files-how-to-find-the-culprit

You can do some investigation to see which process open so many files,
please refer to above link. I did encounter such a problem several years
ago but didn't see it for some time.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




恩爸 <441586...@qq.com> 于2020年7月9日周四 下午10:31写道：

> Hi, you can use command 'ulimit -a' to see the limitation of opening
> files, and google how to modify this limitation.
>
> --
>
>
> Best regards,
> Zhichao Zhang
>
>
>
>
> -- 原始邮件 --
> *发件人:* "user" <3281438...@qq.com>;
> *发送时间:* 2020年7月9日(星期四) 下午5:43
> *收件人:* "user";
> *抄送:* "crowgns";
> *主题:* 回复：Re: 执行cube在第4步报错，提示“打开的文件过多” 是怎么回事？
>
> 不好意思，对于kylin我了解还不是很深刻，也不知道句柄是啥意思。请问要改哪个的句柄数？具体操作是怎么样的呢？
>
>
> -- 原始邮件 --
> *发件人:* "初晓";
> *发送时间:* 2020年7月9日(星期四) 下午3:53
> *收件人:* "user";
> *主题:* Re:Re: 执行cube在第4步报错，提示“打开的文件过多” 是怎么回事？
>
> 句柄用光了吧，改一下句柄数
>
>
>
>
>
>
> At 2020-07-09 10:39:53, "Yaqian Zhang"  wrote:
>
> Hi:
>
> If it's useful to restart kylin or rebuild the job?
>
> 在 2020年7月8日，11:16，梅秋莹 <3281438...@qq.com> 写道：
>
>
>
>
> -- 原始邮件 --
> *发件人:* "梅秋莹"<3281438...@qq.com>;
> *发送时间:* 2020年7月3日(星期五) 下午4:55
> *收件人:* "user";
> *主题:* When build cube in kylin, it failed at the 4th step "Build
> Dimension Dictionary"
>
> Hello everyone:
>
>  Recently, we have encountered a new problem. That is, when build
> cube in kylin, it failed at the 4th step "Build Dimension Dictionary".
>
> 
> and the error message is like below. Before tha， we have never met such
> quetion, and our model and cube have functioned properly for at least one
> month. After reading some articles and blogs, someone think the errors
> could be related to the cube which has been set some measures like, "TOP-N"
> and "COUNT_DISTINCT". Indeed，I have set the measures like "TOP-N" and
> "COUNT_DISTINCT"  when create cube. What should we do to solve the proble?
> Please tell me if you know the reason. Thank you very much.
>
> our kylin is 2.6.3.
>
> org.apache.kylin.engine.mr.exception.HadoopShellException: 
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /etc/hadoop/3.1.0.0-78/0/core-site.xml (打开的文件过多)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3000)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2926)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2806)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.kylin.common.util.HadoopUtil.healSickConfig(HadoopUtil.java:69)
>   at 
> org.apache.kylin.common.util.HadoopUtil.getCurrentConfiguration(HadoopUtil.java:59)
>   at 
> org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:103)
>   at 
> org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:95)
>   at 
> org.apache.kylin.common.util.HadoopUtil.getWorkingFileSystem(HadoopUtil.java:82)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob$2.getDictionary(CreateDictionaryJob.java:92)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:95)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExec

Re: kylin3.0数据结果异常

2020-07-03 Thread ShaoFeng Shi

Hi xiqiang,

Checked the log, didn't find exceptional information. Did you update the
HBase coprocessor after upgrade from 2.6 to 3.0? Or you can try the latest
3.0.2. I didn't remember there is a bug related to this, but it worth a try.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




苑希强  于2020年7月3日周五 下午5:50写道：

> 问题现象：
> 对23亿数据进行月度Cube构建，查询结果大部分与直接查询一致，少量数据不一致。
> 经验证发现，base cubo查询时数据还是一致的，减少维度后数据异常。
>
> 其他信息：
> 1.Kylin从2.6.2升级至3.0.0，升级前数据正常。
> 2.基于同一个表构建的日和周Cube，数据没有问题。
>
> Cube构建和查询日志见附件中文本。
> 相关查询结果见附件中图片。
>
>
>
>
>
>
>
>
>
>

Re: [Announce] Apache Kylin 3.1.0 released

2020-07-02 Thread ShaoFeng Shi

Great! Thanks to everyone who contributed to this release!

We encourage Kylin 2.x users to upgrade to Kylin 3, which has been verified
by several early and big users. In the future, we will focus more on 3.1
and 4.0 development.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




George Ni  于2020年7月2日周四 下午9:44写道：

> The Apache Kylin team is pleased to announce the immediate availability of
> the 3.1.0 release.
>
> This is a major release after 3.0.0, with 10 new features and 142 bug
> fixes and enhancement. All of the changes in this release can be found in:
> https://kylin.apache.org/docs/release_notes.html
>
> You can download the source release and binary packages from Apache
> Kylin's download page: https://kylin.apache.org/download/
>
> Apache Kylin is an open-source Distributed Analytical Data Warehouse for
> Big Data; it was designed to provide OLAP (Online Analytical Processing)
> capability in the big data era. By renovating the multi-dimensional cube
> and precalculation technology on Hadoop and Spark, Kylin is able to achieve
> near-constant query speed regardless of the ever-growing data volume.
> Reducing query latency from minutes to sub-second, Kylin brings online
> analytics back to big data.
>
> Apache Kylin lets you query billions of rows at sub-second latency in 3
> steps:
> 1. Identify a Star/Snowflake Schema on Hadoop.
> 2. Build Cube from the identified tables.
> 3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC or
> RESTful API.
>
> Thanks to everyone who has contributed to this release.
>
> We welcome your help and feedback. For more information on how to report
> problems, and to get involved, visit the project website at
> https://kylin.apache.org/
>
> --
>
> -
>
> Best regards,
>
>
>
> Ni Chunen / George
>

Re: kylin启动问题

2020-06-22 Thread ShaoFeng Shi

should be; you can check the bin/check-hive-usability.sh script, it has a
loop which waits for 60 seconds.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




天外飞星 <254578...@qq.com> 于2020年6月23日周二 上午10:11写道：

> 我在机器上执行 hive -e 'select
> 1'这条命令可以正常执行，但是启动kylin还是报同样的错，会不会和我的机器执行hive命令时间过长有关，大概90s左右
>
>
> -- 原始邮件 --
> *发件人:* "ShaoFeng Shi";
> *发送时间:* 2020年6月23日(星期二) 上午9:49
> *收件人:* "user";
> *主题:* Re: kylin启动问题
>
> This message means, the Hive command seems not working on the machine. You
> need to check whether Hive has been properly installed and configured. You
> can try the "hive" command and then do some operations like show tables,
> query data, etc.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> 天外飞星 <254578...@qq.com> 于2020年6月23日周二 上午9:35写道：
>
>> kylin启动时遇到 ERROR: Check hive's usability failed, please check the
>> status of your cluster，请问出现这个错误是什么原因，应该怎么解决，谢谢
>>
>

Re: kylin启动问题

2020-06-22 Thread ShaoFeng Shi

This message means, the Hive command seems not working on the machine. You
need to check whether Hive has been properly installed and configured. You
can try the "hive" command and then do some operations like show tables,
query data, etc.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




天外飞星 <254578...@qq.com> 于2020年6月23日周二 上午9:35写道：

> kylin启动时遇到 ERROR: Check hive's usability failed, please check the
> status of your cluster，请问出现这个错误是什么原因，应该怎么解决，谢谢
>

Fwd: See you at: Apache Kylin on Parquet: Introduction to the New Storage Engine

2020-06-16 Thread ShaoFeng Shi

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Meetup//Meetup Events v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Events - Big Data Bellevue (BDB)
X-MS-OLK-FORCEINSPECTOROPEN:TRUE
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
TZURL:http://tzurl.org/zoneinfo-outlook/America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T02
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200615T074150Z
DTSTART;TZID=America/Los_Angeles:20200617T183000
DTEND;TZID=America/Los_Angeles:20200617T20
STATUS:CONFIRMED
SUMMARY:Apache Kylin on Parquet: Introduction to the New Storage Engine
DESCRIPTION:Big Data Bellevue (BDB)\nWednesday\, June 17 at 6:30 PM\n\nAp
 ache Kylin is an open source distributed analytical data warehouse for b
 ig data. It was designed to provide OLAP (Online Analytical Processing) 
 capa...\n\nhttps://www.meetup.com/Big-Data-Bellevue-BDB/events/270779777
 /
ORGANIZER;CN=Meetup Reminder:MAILTO:i...@meetup.com
CLASS:PUBLIC
CREATED:20151015T202443Z
GEO:47.61;-122.20
LOCATION:Online event
URL:https://www.meetup.com/Big-Data-Bellevue-BDB/events/270779777/
SEQUENCE:2
LAST-MODIFIED:20200603T193630Z
UID:event_fxbnllybcj...@meetup.com
END:VEVENT
END:VCALENDAR


meetup.ics
Description: application/ics

Re: Kylin with Parquet

2020-06-14 Thread ShaoFeng Shi

Hi Manish,

As parquet support encoding, Kylin doesn't need to do that anymore; That
means, it is possible to read the origin values directly from the cube
files.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Manish Jain  于2020年6月14日周日 上午11:09写道：

> It make sense. Are we also planning to change storage encoding or it will
> remain the same? Will we able read data using normal Hive/Presto queries or
> it will require Kylin reader only ?!
>
> On Sun, 14 Jun 2020 at 7:42 AM, ShaoFeng Shi 
> wrote:
>
>> This is a good question;
>>
>> One of the purposes of developing the parquet storage is to overcome the
>> limitations of HBase, which also means, to replace HBase; If the new
>> storage is successful, we may stop to maintain the HBase engine.
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>>
>> Manish Jain  于2020年6月13日周六 下午11:47写道：
>>
>>> Ok, thanks.
>>>
>>> On Sat, 13 Jun 2020 at 8:15 PM, Liukaige  wrote:
>>>
>>>> Hey Manish,
>>>>
>>>> The new Parquet storage is almost ready but has not been released yet.
>>>> In one installation you can only choose one of them, not both together.
>>>> I guess the community will continue to maintain HBase version for a
>>>> while. But it will be deprecated in the future.
>>>>
>>>> Best Regards,
>>>> Kai
>>>>
>>>> Manish Jain  于2020年6月12日周五 下午10:46写道：
>>>>
>>>>> Is Kylin supports storage in Parquet now instead of HBase ? Or it
>>>>> supports both Parquet and Hbase ?
>>>>> --
>>>>> Best Regards,
>>>>> Manish Jain
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Kaige Liu(刘凯歌)
>>>>
>>> --
>>> Best Regards,
>>> Manish Jain
>>>
>> --
> Best Regards,
> Manish Jain
>

Re: Kylin with Parquet

2020-06-13 Thread ShaoFeng Shi

This is a good question;

One of the purposes of developing the parquet storage is to overcome the
limitations of HBase, which also means, to replace HBase; If the new
storage is successful, we may stop to maintain the HBase engine.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Manish Jain  于2020年6月13日周六 下午11:47写道：

> Ok, thanks.
>
> On Sat, 13 Jun 2020 at 8:15 PM, Liukaige  wrote:
>
>> Hey Manish,
>>
>> The new Parquet storage is almost ready but has not been released yet. In
>> one installation you can only choose one of them, not both together.
>> I guess the community will continue to maintain HBase version for a
>> while. But it will be deprecated in the future.
>>
>> Best Regards,
>> Kai
>>
>> Manish Jain  于2020年6月12日周五 下午10:46写道：
>>
>>> Is Kylin supports storage in Parquet now instead of HBase ? Or it
>>> supports both Parquet and Hbase ?
>>> --
>>> Best Regards,
>>> Manish Jain
>>>
>>
>>
>> --
>> Best regards,
>>
>> Kaige Liu(刘凯歌)
>>
> --
> Best Regards,
> Manish Jain
>

Re: Start kylin3.0.2 failed due to Guava version mismatch with HBase 2.0.2

2020-05-24 Thread ShaoFeng Shi

I'm not sure on that, I don't have the environment :-); You can take a try,
or build a snapshot version from the master branch.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Discovery  于2020年5月22日周五 下午6:29写道：

> Hi Shaofeng，
> Thanks for your quick reply， if I switch to use version 2.6.6, is it a
> similiar issue with HBase 2.0.x?
>
> Best Regards
> Discovery
>
>
>
> -- 原始邮件 ----------
> *发件人:* "ShaoFeng Shi";
> *发送时间:* 2020年5月22日(星期五) 晚上6:16
> *收件人:* "user";
> *主题:* Re: Start kylin3.0.2 failed due to Guava version mismatch with
> HBase 2.0.2
>
> Hello, such Guava conflict issue will be fixed in:
> https://issues.apache.org/jira/projects/KYLIN/issues/KYLIN-4394
>
> The target release is 3.1; There is a PR on this, I'm merging it; You can
> cherry-pick the patch if urgent.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> Discovery  于2020年5月22日周五 下午3:57写道：
>
>> Hi Kylin experts,
>>
>>I am using latest Kylin 3.0.2 with Hadoop 3.1.1, HBase 2.0.2,
>> Spark 2.3.2, Hive 3.1.0, But I run into below issue when running "kylin.sh
>> start", I check the version Kylin uses 14.0 and HBase uses 28.0-jre
>> version. How can I get rid of this issue? Thanks in advance!
>>
>> 2020-05-22 15:34:16,520 INFO  [main] common.KylinConfig:150 : Initialized
>> a new KylinConfig from getInstanceFromEnv : 1496355635
>> 2020-05-22 15:34:16,591 INFO  [main] persistence.ResourceStore:90 : Using
>> metadata url kylin_metadata@hbase for resource store
>> Exception in thread "main" java.lang.IllegalArgumentException: Failed to
>> find metadata store by url: kylin_metadata@hbase
>> at
>> org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:101)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:113)
>> at
>> org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:99)
>> at
>> org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:43)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:94)
>> ... 3 more
>> Caused by: java.lang.NoSuchMethodError:
>> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1358)
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1339)
>> at
>> org.apache.kylin.common.util.HadoopUtil.healSickConfig(HadoopUtil.java:74)
>> at
>> org.apache.kylin.common.util.HadoopUtil.getCurrentConfiguration(HadoopUtil.java:60)
>> at
>> org.apache.kylin.storage.hbase.HBaseConnection.newHBaseConfiguration(HBaseConnection.java:170)
>> at
>> org.apache.kylin.storage.hbase.HBaseConnection.get(HBaseConnection.java:259)
>> at
>> org.apache.kylin.storage.hbase.HBaseResourceStore.getConnection(HBaseResourceStore.java:95)
>> at
>> org.apache.kylin.storage.hbase.HBaseResourceStore.createHTableIfNeeded(HBaseResourceStore.java:114)
>> at
>> org.apache.kylin.storage.hbase.HBaseResourceStore.(HBaseResourceStore.java:88)
>> ... 8 more
>> 2020-05-22 15:34:16,903 INFO  [close-hbase-conn]
>> hbase.HBaseConnection:138 : Closing HBase connections...
>> ERROR: Unknown error. Please check full log.
>>
>> Best Regards
>> Discovery
>>
>

Re: Start kylin3.0.2 failed due to Guava version mismatch with HBase 2.0.2

2020-05-22 Thread ShaoFeng Shi

Hello, such Guava conflict issue will be fixed in:
https://issues.apache.org/jira/projects/KYLIN/issues/KYLIN-4394

The target release is 3.1; There is a PR on this, I'm merging it; You can
cherry-pick the patch if urgent.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Discovery  于2020年5月22日周五 下午3:57写道：

> Hi Kylin experts,
>
>I am using latest Kylin 3.0.2 with Hadoop 3.1.1, HBase 2.0.2,
> Spark 2.3.2, Hive 3.1.0, But I run into below issue when running "kylin.sh
> start", I check the version Kylin uses 14.0 and HBase uses 28.0-jre
> version. How can I get rid of this issue? Thanks in advance!
>
> 2020-05-22 15:34:16,520 INFO  [main] common.KylinConfig:150 : Initialized
> a new KylinConfig from getInstanceFromEnv : 1496355635
> 2020-05-22 15:34:16,591 INFO  [main] persistence.ResourceStore:90 : Using
> metadata url kylin_metadata@hbase for resource store
> Exception in thread "main" java.lang.IllegalArgumentException: Failed to
> find metadata store by url: kylin_metadata@hbase
> at
> org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:101)
> at
> org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:113)
> at
> org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:99)
> at
> org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:43)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:94)
> ... 3 more
> Caused by: java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1358)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1339)
> at
> org.apache.kylin.common.util.HadoopUtil.healSickConfig(HadoopUtil.java:74)
> at
> org.apache.kylin.common.util.HadoopUtil.getCurrentConfiguration(HadoopUtil.java:60)
> at
> org.apache.kylin.storage.hbase.HBaseConnection.newHBaseConfiguration(HBaseConnection.java:170)
> at
> org.apache.kylin.storage.hbase.HBaseConnection.get(HBaseConnection.java:259)
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.getConnection(HBaseResourceStore.java:95)
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.createHTableIfNeeded(HBaseResourceStore.java:114)
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.(HBaseResourceStore.java:88)
> ... 8 more
> 2020-05-22 15:34:16,903 INFO  [close-hbase-conn] hbase.HBaseConnection:138
> : Closing HBase connections...
> ERROR: Unknown error. Please check full log.
>
> Best Regards
> Discovery
>

Re: 关于union all之后，字段数值会产生null值的问题

2020-05-15 Thread ShaoFeng Shi

Thanks Yaqian, please go ahead.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2020年5月7日周四 下午5:25写道：

> Hi sir:
>
> I have reproduced the problem you described in my environment. It may be a
> bug. I will try to find out the root cause.
>
> You can open an issue in JIRA to track this.
>
>
> 在 2020年5月7日，15:44，欧秋斌  写道：
>
> 您好！
>
> 我的工作中，由于订单类型的维度标准不是确定的，所以需要在SQL语句中进行定义（如下方第一个公用表达式）。
> 在下面的语句中，进行union all拼接时，
> 第二个维度的销售额（val_day）值会出现null。但是单独执行订单类型和日期这两个维度中的任何一个都可以返回结果。唯独两者进行union
> all时会出问题。
> 我想了没弄明白，想请教一下大家，这个问题出在哪里 ?
>
> with cte as( -- 每天和每种订单类型的销售额汇总
> select part_dt,
>case when lstg_format_name='Auction' then '一类订单'
>when lstg_format_name='FP-non GTC' then '二类订单'
>when lstg_format_name='ABIN' then '三类订单'
>when lstg_format_name='FP-GTC' then '四类订单'
>else '其他订单' end style,
>sum(price)  val
> from kylin_sales_ts2
> group by part_dt, case when lstg_format_name='Auction' then '一类订单' when
> lstg_format_name='FP-non GTC' then '二类订单' when lstg_format_name='ABIN' then
> '三类订单' when lstg_format_name='FP-GTC' then '四类订单' else '其他订单' end
> )
> ,cte2 as(   分别按照订单类型和日期维度
> select part_dt, sum(val)over(partition by style) val_style,
> sum(val)over(partition by part_dt) val_day  from cte
> where part_dt>='201310'
> )
>  两种维度结果分别去重后，合并起来
> select part_dt, val_style  from cte2   订单类型维度
> group by part_dt, val_style
>
> union all
> select part_dt, val_day   from cte2  日期维度
> group by part_dt, val_day
>
>
>
>
>
>
>
>
>
>

Re: 问题咨询

2020-05-15 Thread ShaoFeng Shi

Hi Xiang,

I'm not sure whether Kylin can help; Does Hive/Spark SQL can fullfill the
requirement? If you can provide a couple of SQL queries, that would help us
to see whether Kylin can help.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




寒香 <1014631...@qq.com> 于2020年5月15日周五 下午1:18写道：

> 大家好：
> 我们现在有一个业务需求，大致是从大量数据中筛选出可以同时满足多个规则的子数据集。不同的场景下会有不同的多个规则并且比较复杂，比如数据来源的单个城市占比不能超过15%（当然这个15%是可以按需调整的）、各种通过计算得到的业务值占比不超过某特定值，诸如此类。想请教下可以通过Apache
> Kylin来解决吗？可以的话应该采取什么方案，有没有可供参考的资料？是否需要借助工具完成？谢谢。
>
> Hello，everyone：
> Now we have a business requirement, which is to filter out sub datasets
> from a large number of data that can meet multiple rules at the same time.
> In different scenarios, there will be different and complex rules. For
> example, the proportion of a single city in the data source cannot exceed
> 15% (of course, 15% can be adjusted on demand by users), the proportion of
> various calculated business values does not exceed a specific value, and so
> on. I want to know, can we resolve this requirement by Apache Kylin? What
> plan should be adopted if possible? Is there any information or demo for
> reference? Does it need to be done with other tools?Thanks a lot.
>
>
>

Re: jvm monitor

2020-05-15 Thread ShaoFeng Shi

Glad to know you have solved the problem and shared it with us. Thanks!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




听风看雨  于2020年5月12日周二 上午9:15写道：

> 发现原因了，需要客户端开始查询才会开始收集相关指标；刚启动的时候属性列表是看不到指标的
>
> ---Original---
> *From:* "user-return-5171-jianyong_fu=qq.com
> "
> *Date:* Mon, May 11, 2020 22:08 PM
> *To:* "user";
> *Subject:* jvm monitor
>
> 你好，
> 关于jvm metrics监控问题想请教一下。
> 疑问：开启metrics 导入 jvm 之后，未能找到query相关指标
> 条件：
>1. setenv.sh: 指定开放 jmx 端口
>2. kylin.properties: kylin.server.query-metrics-enbled=true
> 操作：
>  使用 jconsole 远程连接 kylin jmx 端口，于 Mbean模块查看 jvm 相应指标，未能在
> /hadoop/kylin/* 下找到与Query相关的指标。
>
> 还望解惑，谢谢！
>

Fwd: See you at: Using Kylin for Exact COUNT DISTINCT Queries with Sub-Second Latency on Big Data

2020-05-07 Thread ShaoFeng Shi

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Meetup//Meetup Events v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Events - Apache Kylin Bay Area Group
X-MS-OLK-FORCEINSPECTOROPEN:TRUE
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
TZURL:http://tzurl.org/zoneinfo-outlook/America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T02
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200504T020559Z
DTSTART;TZID=America/Los_Angeles:20200507T10
DTEND;TZID=America/Los_Angeles:20200507T11
STATUS:CONFIRMED
SUMMARY:Using Kylin for Exact COUNT DISTINCT Queries with Sub-Second Late
 ncy on Big Data
DESCRIPTION:Apache Kylin Bay Area Group\nThursday\, May 7 at 10:00 AM\n\n
 NOTE: This is a virtual workshop. You must register with the provided li
 nk to receive access to the session. Get the secret to consistently deli
 vering...\n\nhttps://www.meetup.com/Apache-Kylin/events/270051390/
ORGANIZER;CN=Meetup Reminder:MAILTO:i...@meetup.com
CLASS:PUBLIC
CREATED:20200415T001821Z
GEO:37.33;-121.88
LOCATION:Online event
URL:https://www.meetup.com/Apache-Kylin/events/270051390/
SEQUENCE:3
LAST-MODIFIED:20200415T001933Z
UID:event_270051...@meetup.com
END:VEVENT
END:VCALENDAR


meetup.ics
Description: application/ics

New blogs and video about Apache Kylin in English

2020-04-28 Thread ShaoFeng Shi

Hello,


Here are some new blogs and videos about Kylin. I hope they can help new
users to learn about Apache Kylin.


*New Blogs*

   - What’s New with Apache Kylin 3.0?
   <https://kyligence.io/blog/whats-new-with-apache-kylin-3-0/>
  - Samantha and Kaige collaborated to produce this really useful look
  at Kylin 3.0’s new features, why they matter, and better explain
the value
  Kylin provides to those new to the project.
   - Achieve Precision with Count Distinct
   
<https://kyligence.io/blog/how-does-apache-kylin-achieve-precision-with-count-distinct/>
  - Part 3 of Shaofeng’s Count Distinct blog series. A useful topic on
  Count Distinct with Kylin
   - Why Kylin Is the Only OLAP for Count Distinct
   
<https://kyligence.io/blog/why-kylin-is-the-only-olap-engine-for-sub-second-count-distinct-queries/>
  - Part 4 of Shaofeng’s Count Distinct blog series.


*New Videos*

   - Apache Kylin 101 Webinar <https://youtu.be/AfpXas1yr08>
  - A fantastic introductory workshop for Apache Kylin led by Kaige.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: kylin 读写分离问题

2020-04-24 Thread ShaoFeng Shi

Does your HBase cluster HDFS enables HA?

If your HDFS enables name node HA, that name service needs to be configured
on both sides (the build cluster and the query cluster).

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Rupeng Wang  于2020年4月21日周二 下午2:34写道：

> Hi, the images cannot be loaded. As your description, may I ask do these
> two clusters use different name service?  If they use the same name service
> may cause some problems. Hope the following article can be helpful.
> http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>
>
>
>
>
> ---
>
> Best wishes,
>
> Rupeng Wang
>
>
>
>
>
>
>
> *发件人**: *Liu Ya Meng 
> *答复**: *
> *日期**: *2020年4月19日 星期日 13:03
> *收件人**: *, 
> *主题**: *kylin 读写分离问题
>
>
>
> hi 您好，
>
>
>
>  最近在用kylin 3.0.1 版本部署 读写分离集群时遇到个问题，我们按照稳定部署 读写分离 模式的kylin （计算集群+hbase
> 集群）集群后，在构建cube 的时候遇到过找不到nameservice 的情况 ？
>
> 报错如下图：
>
>
>
> --
>
>
>
>
>
>
>
>部署模式图示：
>
>
>
>
>
>
>
> 网上找了很多资料和解决方式，都未能得到妥善解决该问题。希望能得到大神的指点，多谢 ！
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Liu Ya Meng (刘一天)
> Mobile：+86.17858859527
> E-mail：liuyameng1...@126.com OR   yamo.dr...@gmail.com
>
>
>
>
>
>
>
>

Re: kylin error

2020-04-24 Thread ShaoFeng Shi

Looks like a normal OOM issue; Please check to give more Java heap to
sqoop.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




liaifan  于2020年4月24日周五 下午3:28写道：

>
> SLF4J: Found binding in 
> [jar:file:/opt/hadoopclient/HDFS/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in 
> [jar:file:/opt/hadoopclient/HBase/hbase/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in 
> [jar:file:/opt/hadoopclient/Hive/HCatalog/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in 
> [jar:file:/opt/hadoopclient/HBase/hbase/lib/jdbc/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
>  for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> 2020-04-24 10:37:52,111 WARN tool.BaseSqoopTool: Setting your password on the 
> command-line is insecure. Consider using -P instead.
>
> 2020-04-24 10:37:52,154 WARN sqoop.ConnFactory: Parameter --driver is set to 
> an explicit driver however appropriate connection manager is not being set 
> (via --connection-manager). Sqoop is going to fall back to 
> org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which 
> connection manager should be used next time.
>
> Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver 
> class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered 
> via the SPI and manual loading of the driver class is generally unnecessary.
> #
> # java.lang.OutOfMemoryError: GC overhead limit exceeded
> # -XX:OnOutOfMemoryError=""kill -9 %p""
> #   Executing /bin/sh -c ""kill -9 12032""...
> sh: kill -9 12032: δ???
>
>
> ?
> ?й???, 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>  at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
>  at java.lang.StringCoding.decode(StringCoding.java:193)
>  at java.lang.String.(String.java:426)
>
>  at 
> com.sun.tools.javac.file.ZipFileIndex$ZipDirectory.readEntry(ZipFileIndex.java:665)
>
>  at 
> com.sun.tools.javac.file.ZipFileIndex$ZipDirectory.buildIndex(ZipFileIndex.java:576)
>
>  at 
> com.sun.tools.javac.file.ZipFileIndex$ZipDirectory.access$000(ZipFileIndex.java:483)
>  at com.sun.tools.javac.file.ZipFileIndex.checkIndex(ZipFileIndex.java:191)
>  at com.sun.tools.javac.file.ZipFileIndex.(ZipFileIndex.java:136)
>
>  at 
> com.sun.tools.javac.file.ZipFileIndexCache.getZipFileIndex(ZipFileIndexCache.java:100)
>
>  at 
> com.sun.tools.javac.file.JavacFileManager.openArchive(JavacFileManager.java:529)
>
>  at 
> com.sun.tools.javac.file.JavacFileManager.openArchive(JavacFileManager.java:462)
>
>  at 
> com.sun.tools.javac.file.JavacFileManager.listContainer(JavacFileManager.java:348)
>
>  at com.sun.tools.javac.file.JavacFileManager.list(JavacFileManager.java:624)
>  at com.sun.tools.javac.jvm.ClassReader.fillIn(ClassReader.java:2803)
>  at com.sun.tools.javac.jvm.ClassReader.complete(ClassReader.java:2446)
>  at com.sun.tools.javac.jvm.ClassReader.access$000(ClassReader.java:76)
>  at com.sun.tools.javac.jvm.ClassReader$1.complete(ClassReader.java:240)
>  at com.sun.tools.javac.code.Symbol.complete(Symbol.java:574)
>  at com.sun.tools.javac.comp.Enter.visitTopLevel(Enter.java:300)
>
>  at com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:518)
>  at com.sun.tools.javac.comp.Enter.classEnter(Enter.java:258)
>  at com.sun.tools.javac.comp.Enter.classEnter(Enter.java:272)
>  at com.sun.tools.javac.comp.Enter.complete(Enter.java:486)
>  at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
>  at com.sun.tools.javac.main.JavaCompiler.enterTrees(JavaCompiler.java:982)
>  at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
>  at com.sun.tools.javac.main.Main.compile(Main.java:523)
>  at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
>  at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
>
>  at 
> org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:224)
>  at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
>  at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
>
> 2020-04-24 10:37:57,393 ERROR tool.ImportTool: Import failed: 
> java.io.IOException: Error r

Re: Kylin with Presto

2020-04-24 Thread ShaoFeng Shi

Hi Manish,

Are you going to use Kylin as a data source for Presto?

There are some users using Kylin together with Presto, but they do that in
a different way:
1) using Kylin as the speeding layer; Send query to Kylin first, if Kylin
couldn't answer (no cube matched), then route to Presto;
2) pushdown a Kyliln query to Presto when cube not matched.

Implement a Kylin adapter in Presto is also a good idea so that user can
query Kylin together with other data sources in Presto. We can discuss that
if you have further interest.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Manish Jain  于2020年4月23日周四 下午8:29写道：

> Dear kylin users,
> I understand that Kylin uses Calcite to query data. And same support is
> available with BI tools like Tablaue.
>
> I am standardising Presto as SQL layer on our data lake. How can I access
> Kylin data using Presto SQL ?
>
> Best Regards,
> Manish Jain
>

Re: BufferOverflow!Please use one higher cardinality column for dimension column when build RAW cube!

2020-04-17 Thread ShaoFeng Shi

The class is also packaged in lib/kylin-job-.jar which will be
submitted to MR/Spark job.

Anyway, the RAW measure is deprecated, it is not supported.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




我  于2020年4月7日周二 下午4:13写道：

> when i build a cube for RAW type ,there is an error raw cuboid can't
> larger than 1m。
>
> so i modify the BufferedMeasureCodec class , set a larger number to the
> filed "DEFAULT_BUFFER_SIZE= 1024 * 1024 * 10"
>
> and i mvn package the kylin-core-metadata.jar upload to the path:
> ${kylin_home}/tomcate/webapps/kylin/WEB-INFO/lib
>
> also fixed the class BufferedMeasureCodec.class for the jar:
> ${kylin_home}/lib/kylin-coprocessor-3.0.1.jar
>
> and the jar: ${kylin_home}/tool/kylin-tool-3.0.1.jar
>
> but it doesn't work!
>
> so i want to know whic path i can upload, to fix the default size 1m for
> RAW type Cube build. thanks.
>
>
>
>

Online event: "Apache Kylin 101: Get Sub-Second Analytics on Massive Datasets"

2020-04-05 Thread ShaoFeng Shi

Hello Kylin users,

There will be an online Kylin webinar next week, on Apr 9, 2020 10:00 AM in
Pacific Time (US and Canada); We posted the information on meetup.com; as
this is an online event, there is no number limit anymore, welcome to join:

Subject: "Apache Kylin 101: Get Sub-Second Analytics on Massive Datasets"

Register: https://www.meetup.com/Apache-Kylin/events/269830085/
or
https://kyligence.zoom.us/webinar/register/WN_vEXiC_KrQPWdB8fHpf2UhQ

As the COVID-2019 has a terrible impact on healthy, to avoid people
gathering, last month we have moved one meetup online, hopefully will have
more online in the future.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: question about kylin query engine's data accuracy

2020-03-27 Thread ShaoFeng Shi

Did you check the answers in FAQ? It lists several situations.
https://kylin.apache.org/docs/gettingstarted/faq.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




nichunen  于2020年1月9日周四 上午9:45写道：

>
> Hi Kang-sen,
>
> Can you reproduce it with Kylin’s sample cube?
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
> On 01/9/2020 00:54，Lu, Kang-Sen  wrote：
>
> I am running kylin 2.6. I just noticed that when I run the following sql
> statement again the hive and kylin, the result are different. Did anyone
> see the same problem?
>
>
>
> SELECT concat(concat(A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR), '00')
> TIME_KEY, COUNT(*) vl_aggs_model___SUM_EVENT_COUNT FROM A_VL_HOURLY_V WHERE
> (A_VL_HOURLY_V.THEDATE = '20190511' ) AND (A_VL_HOURLY_V.THEHOUR = '01')
> GROUP BY A_VL_HOURLY_V.THEDATE, A_VL_HOURLY_V.THEHOUR ;
>
>
>
> Hive query returns:
>
>
>
> +---+--+--+
>
> |   time_key| vl_aggs_model___sum_event_count  |
>
> +---+--+--+
>
> | 201905110100  | 8968815  |
>
> +---+--+--+
>
>
>
> The kylin insight returned:
>
>
>
> 201905110100 | 8968800
>
>
>
> The row count in hive is 15 more than in kylin.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
>
>
>
> --
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> --
>
>

Re: [Announce] Welcome new Apache Kylin committer: Kaige Liu

2020-03-27 Thread ShaoFeng Shi

Welcome Kaige; Looking forward to seeing more contributions from you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Dong Li  于2020年3月27日周五 上午8:32写道：

> Welcome Kaige!
>
> Thanks,
> Dong Li
>
>
> On Thu, Mar 26, 2020 at 9:22 PM George Ni  wrote:
>
>> I am very pleased to announce that the Project Management Committee (PMC)
>> of Apache Kylin has asked Kaige Liu to become Apache Kylin committer, and
>> he has already accepted.
>>
>> Kaige joined Apache Kylin community since 2016. He has been actively
>> involved with helping build the community. Kaige is keeping contributing
>> codes to the project, fixing bugs, developing new features. In addition to
>> the contribution of patches, he actively expands the influence of Apache
>> Kylin, helps more people know and adopt Apache Kylin. We are so glad to
>> have him to be our new committer.
>>
>> Please join me to welcome Kaige.
>>
>> -
>>
>> Best regards,
>>
>>
>>
>> Ni Chunen / George
>>
>

Re: kylin超高基维查询奔溃

2020-03-08 Thread ShaoFeng Shi

Sorting on such a high cardinality dimension in memory is very hard to
finish in seconds. Please try Top-N pre-calculation:

https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Johnson  于2020年3月8日周日 下午6:21写道：

> 最近构建了一个包含超高基维的cube，查询时直接 hbase 协处理器超时，sql如下：select
> sum(filesize)/1024/1024 fs ,path6 from impala_monitor.V_MONITOR_HDFS_INFO
> where par_dt = '2020-03-06' group by path6 order by fs desc limit 100 。
> path6 这个维度 基数在10亿+。大家在处理这种超高基维度时有什么优化吗？
>
> 报错：
> org.apache.hadoop.hbase.DoNotRetryIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed deadline!
> Maybe server is overloaded at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:226)
> at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:261)
> at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7996)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1968)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163)
> while executing SQL: "select sum(filesize)/1024/1024 fs ,path6 from
> impala_monitor.V_MONITOR_HDFS_INFO where par_dt = '2020-03-06' group by
> path6 order by fs desc limit 100"
>
>
>
>

Re: docker run error

2020-02-28 Thread ShaoFeng Shi

I also tested it these days, it works; Anyway, the new 3.0.1 is also
updated. Please check it here:

https://github.com/apache/kylin/tree/master/docker

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




猫猫 <16770...@qq.com> 于2020年2月28日周五 上午8:18写道：

> http://kylin.apache.org/docs/install/kylin_docker.html
>
> I use this docker image,but zookeeper don't run.
>
> docker pull apachekylin/apache-kylin-standalone:3.0.0-alpha2
>
> in entrypoint.sh file hasn't start zookeeper cmd.
>
>  is this image disabled？
>

Re: [DISCUSS] Upgrade Kylin's dependency to Hadoop 3 / HBase 2

2020-02-27 Thread ShaoFeng Shi

Hi Yang,

The main difference between 2.6 and 3.0 is the new real-time OLAP feature.
Hadoop 2 users can select either of them, depends on whether they need the
real-time feature.

After 3.0, the next major features would be the Flink cube engine (planned
in v3.1) and the Parquet storage (early stage, maybe in v4.0).

When the parquet storage is released, as the dependency on HBase can be
dropped, then we assume the API issue will easier than today. We can
re-evaluate the possibility to support Hadoop 2.

So I think the impact on today's Hadoop 2 users is acceptable. Not mention
that they still can manually compile that.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Li Yang  于2020年2月27日周四 上午7:37写道：

> The proposal means Kylin 3.0 will be the last major version that supports
> Hadoop 2.
>
> What will be recommended version for Hadoop 2 users after this? I feel the
> latest stable version of 2.6 is better than 3.0.
>
> Anyway, I'm fine with moving focus to Hadoop 3. That is the direction.
> However we shall also think about what it means for Hadoop 2 users.
> Questions like below shall also be answered.
>
> - What is the recommended version/branch for Hadoop 2? (Btw, 3.0 does not
> sound right here.)
> - How that version/branch will be maintained?
>
> +1 in general
>
> Regards
> -Yang
>
>
> On Wed, Feb 26, 2020 at 5:36 PM Zhou Kang  wrote:
>
> > +1
> >
> >
> > > 2020年2月26日 下午3:48，ShaoFeng Shi  写道：
> > >
> > > Hello, Kylin users and developers,
> > >
> > > As we know Hadoop 3 and HBase 2 have released for some time. Kylin
> > starts to support Hadoop 3  since v2.5.0 in Sep 2018.  As the APIs of
> HBase
> > 1 and 2 are incompatible, we need to keep different branches for them.
> And
> > in each release, we need to build separate packages and do a round of
> > testing for them separately. Furthermore, Cloudera's API difference with
> > the Apache release makes the situation worse; We need to build 4 binary
> > packages for reach release. That has spent much of our manual effort and
> > computing resources.
> > >
> > > Today, Hadoop 3 + HBase 2 becomes enough mature and stable for
> > production use; And we see more and more users are starting to use the
> new
> > versions. We think it is time for Kylin to totally upgrade to the new
> > version. So that we can focus more on Kylin itself, instead of
> environments.
> > >
> > >  Here is my proposal:
> > > 1) From Kylin 3.1,  Hadoop/HBase version upgrades to 3.1/2.1 (or a
> close
> > version);
> > > 2) Hadoop 2 and HBase 1 users can use Kylin 3.0 and previous releases;
> > > 3) We will re-evaluate the need for building binary packages for
> > Cloudera release. (we may raise another discuss)
> > >
> > > Please let us know your comments. And please also understand with the
> > limited resource we couldn't support multiple Hadoop versions...
> > >
> > > Thanks!
> > >
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > > Apache Kylin PMC
> > > Email: shaofeng...@apache.org
> > >
> > > Apache Kylin FAQ:
> https://kylin.apache.org/docs/gettingstarted/faq.html
> > > Join Kylin user mail group: user-subscr...@kylin.apache.org
> > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org
> > >
> > >
> >
> >
>

[DISCUSS] Upgrade Kylin's dependency to Hadoop 3 / HBase 2

2020-02-25 Thread ShaoFeng Shi

Hello, Kylin users and developers,

As we know Hadoop 3 and HBase 2 have released for some time. Kylin starts
to support Hadoop 3  since v2.5.0 in Sep 2018.  As the APIs of HBase 1 and
2 are incompatible, we need to keep different branches for them. And in
each release, we need to build separate packages and do a round of testing
for them separately. Furthermore, Cloudera's API difference with the Apache
release makes the situation worse; We need to build 4 binary packages for
reach release. That has spent much of our manual effort and computing
resources.

Today, Hadoop 3 + HBase 2 becomes enough mature and stable for production
use; And we see more and more users are starting to use the new versions.
We think it is time for Kylin to totally upgrade to the new version. So
that we can focus more on Kylin itself, instead of environments.

 Here is my proposal:
1) From Kylin 3.1,  Hadoop/HBase version upgrades to 3.1/2.1 (or a close
version);
2) Hadoop 2 and HBase 1 users can use Kylin 3.0 and previous releases;
3) We will re-evaluate the need for building binary packages for Cloudera
release. (we may raise another discuss)

Please let us know your comments. And please also understand with the
limited resource we couldn't support multiple Hadoop versions...

Thanks!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[DISCUSS] Collect Kylin best practices with Apache Wiki

2020-02-18 Thread ShaoFeng Shi

Hello Kylin users,

I'm proposing to collect the Kylin best practices with Apache Wiki. I have
created an entry page, and start to compose some there. If you want to
share or contribute, please email to the group, then we will review and add
to it. The practice should be brief and easy to understand; If it need to
dive into detail, a reference link can be provided together. Let's try,
thank you!

Here is the wiki link:
https://cwiki.apache.org/confluence/display/KYLIN/Best+practices

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: [Discuss] Add webhook to Kylin

2020-02-03 Thread ShaoFeng Shi

Good proposal. Is there some standard and popular framework for this? We
can integrate with the best solution there.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Liukaige  于2020年1月25日周六 上午1:01写道：

> Totally agreed. And this feature can be integrated with ETL scheduling,
> data governance, approval flow etc. Brilliant idea. If you need any help,
> count me in.
>
> 朱卫斌  于2020年1月20日周一 下午10:10写道：
>
>> I think there should be a set of kylin's event, metrics mechanism,
>> event-driven, provide event, metrics interfaces, we can implement specific
>> listeners in the form of plugins. For example, we can implement dingtalk
>> plugin, SMS plugin or any other plugin, which has a high Flexibility (Not
>> only do notifications, but also do many things, such as unifying and even
>> interfering with the task.). We can refer to the design and implementation
>> of spark event and metrics.
>> I think this is very valuable, we can do it together.
>>
>> weibin0516
>> codingfor...@126.com
>> Best wishes !
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=weibin0516=codingforfun%40126.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fwzpmmc%2F54c20faa3a1910ad49f4a5f1965fba47.jpg=%5B%22codingforfun%40126.com%22%2C%22Best+wishes+%21%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 01/21/2020 10:57，Xiaoxiang Yu 
>> wrote：
>>
>> It looks good, and it should be useful for IT team, please go ahead!
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>> At 2020-01-20 19:58:40, "Zhou Kang"  wrote:
>>
>> Hi Kylin users & developers:
>>
>>
>>
>>Many apps support webhook, it is one way that apps can send
>> automated messages or information to other apps.
>>
>>Use webhook, I think we can send messages to chat tools（slack,
>> dingding）, send sms , trigger another workflow.
>>
>>Do we need to add webhook to Kylin? Such as when cubing job
>> finished.
>>
>>What do you think about this, and which is the better way in your
>> environment ?
>>
>>
>
> --
> Best regards,
>
> Kaige Liu(刘凯歌)
>
> *"Do small things with great love." *
>

Re: [Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

2020-01-21 Thread ShaoFeng Shi

Thanks to the ones who gave comments. This thread is still open for wider
discussion. I plan to update the home page in Feb, after the Chinese New
Year holiday.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Luke Han  于2020年1月19日周日 下午2:02写道：

> +1,
>
> Kylin is helping many companies to manage their Golden Data for Big Data,
> and the most of use cases are for Analytics purpose.
> From OLAP to Analtyics DW is the destination of Kylin.
>
> looking forward to the legend to evolve to the next stage.
>
> Cool!
>
> Best Regards!
> -
>
> Luke Han
>
>
> On Mon, Jan 13, 2020 at 3:12 PM codingfor...@126.com 
> wrote:
>
>> +1.
>> Maybe kylin can support materialized views someday.
>>
>>
>> 在 2020年1月13日，14:58，Xiaoxiang Yu  写道：
>>
>> +1
>> Great suggestion. And I wish in the future, Kylin could support more and
>> more data source and provided better performance when build segment .
>>
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>> At 2020-01-12 20:32:12, "ShaoFeng Shi"  wrote:
>>
>> Hello, Kylin developers and users, HAPPY NEW YEAR 2020!
>>
>> In last month, we released Kylin 3.0, with the new Real-time streaming
>> feature and a Lambda architecture. This allows our users to host only one
>> system for both batch and real-time analytics, and then can query batch and
>> streaming data together.
>>
>> If you look at Kylin's home page, its slogan is still the "OLAP Engine
>> for Big data", which was made 5 years ago when it was born. While today,
>> Kylin's capability has been verified beyond an "OLAP engine". I visited
>> many Kylin users in China, US, Euro in last year, and have got many
>> different scenarios:
>>
>> 1. eBay initiated the Kylin project to offload analytical workloads from
>> Teradata to Hadoop; Kylin serves the online queries with high performance
>> and high availability. Till today, Kylin serves millions of queries every
>> day, most are in < 1 seconds;
>> 2. China Unionpay and CPIC use Kylin to replace IBM Cognos cubes. One
>> Kylin cube replaced more than 100 Cognos cubes, with better building
>> performance and query performance.
>> 3. China Construction Bank uses Hadoop + Kylin to offload the Greenplum.
>> Some systems have been migrated to Kylin successfully.
>> 4. Yum (KFC) and several other users are using Kylin to replace Microsoft
>> SSAS.
>> 5. Meituan, Ctrip, JD, Didi, Xiao Mi, Huawei, OLX group, autohome.com.cn,
>> Xactly, and many others are using Kylin as the platform of their DaaS (Data
>> as a Service), providing data service to their thousands of internal
>> analysts and tens of thousands of external tenants.
>>
>> Now let's look at the definition of Data warehouse [1]:
>>
>> "*A data warehouse is a subject-oriented, integrated, time-variant and
>> non-volatile collection of data in support of management's decision-making
>> process.*"
>>
>> In Kylin, each model/cube is created for a certain subject; Kylin
>> integrates well with Hive, Hadoop, Spark, Kafka, and other systems; Kylin
>> incremental loads the data by time, build the cube and then save as
>> segments (partitions), and they are non-volatile unless you refresh them;
>> During the analysis (roll-up, drill-down, etc), the data is always
>> consistent. Kylin provides SQL interface and JDBC/ODBC/HTTP API for you to
>> easily connect from BI/visualization tools like Tableau and others.
>>
>> All in all, you can see that users are using Kylin not just as a SQL
>> engine, but also as an Analytical Data Warehouse, for very large scale data
>> (PB scale). In the world of big data, Kylin is unique. Its design is
>> elegant, its architecture is scalable and pluggable.  In order to give
>> Kylin more visibility and can be discovered by more people, I propose to
>> change Kylin's position/slogan from the "OLAP engine for big data" to
>> "Analytical Data warehouse for big data".
>>
>> Please feel free to share your comments.
>>
>> [1]
>> https://www.1keydata.com/datawarehousing/data-warehouse-definition.html
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>>

Re: intersect_value raise error

2020-01-13 Thread ShaoFeng Shi

Hi ZF，

The intersect_value function wasn't released yet.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Xiaoxiang Yu  于2020年1月13日周一 下午9:46写道：

> Thank you for your suggestion, I think it is very valuable.
> Currently, doc's maintainer choose to use "/docs"  as url for the LATEST
> documentation, and "/docsxx" (such as /docs24)  as url for the specific
> version.
>
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
> At 2020-01-13 18:27:51, "ZF" <310866...@qq.com> wrote:
>
> have any  SQL Manual which is consistent with kylin version?
> otherwise, many people   will have this kind of problem...
>
> since this doc dose not have version information:
> http://kylin.apache.org/docs/tutorial/sql_reference.html
>
>
> -- 原始邮件 --
> *发件人:* "Xiaoxiang Yu";
> *发送时间:* 2020年1月13日(星期一) 下午2:47
> *收件人:* "user";"ZF"<310866...@qq.com>;
> *主题:* Re:intersect_value raise error
>
> Hi friend,
> As far as I know, intersect_value is not available at the moment. I
> guess it is not very easy to implement it because this feature need a
> bidirectional global dictionary.
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
> At 2020-01-13 13:47:21, "ZF" <310866...@qq.com> wrote:
>
> hi,
> I apply intersect_value function to query user_ids for some conditions,
> but I got error:
> From line 2, column 1 to line 2, column 58: No match found for function
> signature INTERSECT_VALUE(, , ) while
> executing SQL: "select * from (select intersect_value(phone_no, label_name,
> array['age', 'sex']) from TEST_USER_ATTR_VERTICAL where ( label_name =
> 'age' and label_value in ('1', '2') ) or ( label_name = 'sex' and
> label_value in ('1') )) limit 5"
>
> here is my kylin version:apache-kylin-3.0.0-beta-bin-hadoop3
> Query sql is :
>
>1. select
>2.
>intersect_value(phone_no, label_name, array['age', 'sex'])
>3.
>from TEST_USER_ATTR_VERTICAL
>4.
>where (
>5.
>label_name = 'age' and label_value in ('1', '2')
>6.
>)
>7.
>or (
>8.
>label_name = 'sex' and label_value in ('1')
>9.
>);
>
>
> btw:  intersect_count is ok
>
>

[Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

2020-01-12 Thread ShaoFeng Shi

Hello, Kylin developers and users, HAPPY NEW YEAR 2020!

In last month, we released Kylin 3.0, with the new Real-time streaming
feature and a Lambda architecture. This allows our users to host only one
system for both batch and real-time analytics, and then can query batch and
streaming data together.

If you look at Kylin's home page, its slogan is still the "OLAP Engine for
Big data", which was made 5 years ago when it was born. While today,
Kylin's capability has been verified beyond an "OLAP engine". I visited
many Kylin users in China, US, Euro in last year, and have got many
different scenarios:

1. eBay initiated the Kylin project to offload analytical workloads from
Teradata to Hadoop; Kylin serves the online queries with high performance
and high availability. Till today, Kylin serves millions of queries every
day, most are in < 1 seconds;
2. China Unionpay and CPIC use Kylin to replace IBM Cognos cubes. One Kylin
cube replaced more than 100 Cognos cubes, with better building performance
and query performance.
3. China Construction Bank uses Hadoop + Kylin to offload the Greenplum.
Some systems have been migrated to Kylin successfully.
4. Yum (KFC) and several other users are using Kylin to replace Microsoft
SSAS.
5. Meituan, Ctrip, JD, Didi, Xiao Mi, Huawei, OLX group, autohome.com.cn,
Xactly, and many others are using Kylin as the platform of their DaaS (Data
as a Service), providing data service to their thousands of internal
analysts and tens of thousands of external tenants.

Now let's look at the definition of Data warehouse [1]:

"*A data warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision-making
process.*"

In Kylin, each model/cube is created for a certain subject; Kylin
integrates well with Hive, Hadoop, Spark, Kafka, and other systems; Kylin
incremental loads the data by time, build the cube and then save as
segments (partitions), and they are non-volatile unless you refresh them;
During the analysis (roll-up, drill-down, etc), the data is always
consistent. Kylin provides SQL interface and JDBC/ODBC/HTTP API for you to
easily connect from BI/visualization tools like Tableau and others.

All in all, you can see that users are using Kylin not just as a SQL
engine, but also as an Analytical Data Warehouse, for very large scale data
(PB scale). In the world of big data, Kylin is unique. Its design is
elegant, its architecture is scalable and pluggable.  In order to give
Kylin more visibility and can be discovered by more people, I propose to
change Kylin's position/slogan from the "OLAP engine for big data" to
"Analytical Data warehouse for big data".

Please feel free to share your comments.

[1] https://www.1keydata.com/datawarehousing/data-warehouse-definition.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[Announce] Apache Kylin 3.0.0 released

2019-12-20 Thread ShaoFeng Shi

The Apache Kylin team is pleased to announce the immediate availability of
the 3.0.0 release.

This is the GA release of Kylin’s next generation after 2.x, with the new
real-time OLAP feature, Kylin can query streaming data with sub-second
latency. All of the
 changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html


You can download the source release and binary packages from Apache Kylin's
download page:https://kylin.apache.org/download/


Apache Kylin is an open-source Distributed Analytics Engine designed to
provide SQL interface and multi-dimensional analysis (OLAP) on Apache
Hadoop, supporting extremely
 large datasets.


Apache Kylin lets you query massive dataset at sub-second latency in 3
steps:
1. Identify a star schema or snowflake schema data set on Hadoop.
2. Build Cube on Hadoop.
3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
or RESTful API.


Thanks to everyone who has contributed to this release.


We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: Is refresh segments currently supported?

2019-12-10 Thread ShaoFeng Shi

Awesome, thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




liang  于2019年12月10日周二 下午6:23写道：

> Hi ShaoFeng,
>
> I'll create an issue in JIRA later. Thanks for your reply.
>
> Regards,
> Liang
>
> On Tue, Dec 10, 2019 at 6:17 PM ShaoFeng Shi 
> wrote:
>
>> Yes, Kylin supports in parallel segment building. It seems to be a bug
>> introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
>> JIRA so that you can get notified when there is a hot-fix? The JIRA can be
>> reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
>> selecting KYLIN as the project. Thank you!
>>
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>>
>> liang  于2019年12月10日周二 下午5:43写道：
>>
>>> Hi there,
>>>   We noticed a segment refresh job at ERROR status at "*2019-12-10
>>> 02:27:15*". After a deep dive into kylin.log, we found that the
>>> WriteConflictException was raised when trying to update the last modified
>>> time for the dict.
>>>
>>> (Sensitive content are consered)
>>>
>>> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>>>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>>>> error execute
>>>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>>>> name=Build Dimension Dictionary, state=RUNNING}
>>>> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>>>> conflict
>>>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>>>> expect old TS 1575916011421, but it is 1575916035573
>>>> at
>>>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>>>> at
>>>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>>>> at
>>>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>>>> at
>>>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>>>> at
>>>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>>>> at
>>>> org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>>>> at
>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>>>> at
>>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>>>> at
>>>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>>>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>>>> at
>>>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>>> at
>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>> at
>>>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>>>> at
>>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>>>> at
>>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWor

Re: Is refresh segments currently supported?

2019-12-10 Thread ShaoFeng Shi

Yes, Kylin supports in parallel segment building. It seems to be a bug
introduced in KYLIN-3977 in Kylin 2.6.3. Would you like to create it as a
JIRA so that you can get notified when there is a hot-fix? The JIRA can be
reported in https://issues.apache.org/jira/secure/Dashboard.jspa , and
selecting KYLIN as the project. Thank you!


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




liang  于2019年12月10日周二 下午5:43写道：

> Hi there,
>   We noticed a segment refresh job at ERROR status at "*2019-12-10
> 02:27:15*". After a deep dive into kylin.log, we found that the
> WriteConflictException was raised when trying to update the last modified
> time for the dict.
>
> (Sensitive content are consered)
>
> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job
>> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 :
>> error execute
>> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03,
>> name=Build Dimension Dictionary, state=RUNNING}
>> org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>> conflict
>> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>> expect old TS 1575916011421, but it is 1575916035573
>> at
>> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
>> at
>> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
>> at
>> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
>> at
>> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>> at
>> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>> at
>> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
>> at
>> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
>> at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
>> at
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
>> at
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
>> at
>> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>> at
>> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>> at
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>> at
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>> at
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>> at
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>
> We have a glance at the codebase (Kylin-2.6.4), we found the last modified
> time of the dict (which has been built) will be updated at the step "Build
> Dimension Dictionary". The log blow shows the same dict is processed many
> times.
>
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been
>> built, save it"|grep "THIS_IS_COLUMN_NAME"
>> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job
>> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job
>> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 :
>> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
>> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job
>> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.Dictionary

Re: how does cube retention range work

2019-12-10 Thread ShaoFeng Shi

I checked the source code, there is no detailed log. At this moment I have
no idea. Many users already use the auto-merge feature. Not sure what can
block the function, we couldn't guess. Maybe you need to debug that.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Lu, Kang-Sen  于2019年12月10日周二 上午2:56写道：

> Hi, Shaofeng:
>
>
>
> Just to be sure about this sentence: “it will be dropped from the segment
> list first”. Does it mean if we examine the cube storage, the old segment
> will not show? My experience does not match with this description. I had
> several cube segments build for, say, 20180209. Now we are in 2019, those
> segments stays in kylin and I can even query those segments’ data.
>
>
>
> Kang-sen
>
>
>
> *From:* ShaoFeng Shi 
> *Sent:* Friday, December 6, 2019 10:40 PM
> *To:* user 
> *Subject:* Re: how does cube retention range work
>
>
> --
>
> NOTICE: This email was received from an EXTERNAL sender
> --
>
>
>
> Hi kangsen,
>
>
>
> It will be triggered when a new segment is built, see CubeService.
> updateOnNewSegmentReady(), line 637.
>
>
>
> If a cube segment's all date is older than the retention days (say all
> before 30 days; if partial, it will not be selected), it will be dropped
> from the segment list first. The data (hdfs, HBase) cleanup will be
> deferred to StorageCleanupJob time.
>
>
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
> Apache Kylin PMC
>
> Email: shaofeng...@apache.org
>
>
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>
> Join Kylin user mail group: user-subscr...@kylin.apache.org
>
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
>
>
>
>
>
> Lu, Kang-Sen  于2019年12月6日周五 上午12:01写道：
>
> I am running kylin 2.6.3. In kylin GUI configuring cube, at step “Refresh
> Setting”, we can specify “Retention Threshold”, say, 30 (days).
>
>
>
> How would kylin automatically remove cube segments that is older than 30
> days?
>
>
>
> I searched kylin source code, it seems that kylin does save
> “retentionRange” with each CubeDesc. But no other source code refers to
> that retentionRange.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
> --
>
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> --
>
>

Re: Need advice and best practices for Kylin query tuning (not cube tuning)

2019-12-07 Thread ShaoFeng Shi

This is what we called "read-write separated" deployment. Actually many
Kylin users are running in this mode: deploy a dedicated HBase cluster only
for query, and with another Hadoop cluster for cube building. You can refer
to this blog:

https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Andras Nagy  于2019年12月7日周六 上午12:42写道：

> Hi All,
>
> Thanks a lot, guys! With your generous help Xiaoxiang we have been able to
> track down the problem, it was mostly related to too large segments with
> not enough regions, so low level of parallelism in the HBase side.
> Now the p95 of query latency is acceptable for the time being, although we
> will still look into improving it.
>
> However, there is still a small number of queries that take excessively
> long to execute. Upon analyzing these, it turned out that there is a
> regular pattern to when these occur, and they coincide with the cubing job
> executions on the same cluster (hourly stream cube building, daily cube
> refresh to handle very late events).
>
> As a consequence, I'd like to move all of these yarn-based workloads
> (Hive, MapReduce) to a separate EMR cluster (our environment is in AWS) and
> keep another EMR only for HBase. This would also help with better resource
> usage of these clusters, as we wouldn't need a static split of node memory
> between HBase and Yarn.
>
> Does anyone have experience with this setup? Do you see any potential
> issues with it? Any feedback on this is welcome.
>
> Thanks a lot,
> Andras
>
>
> On Mon, Dec 2, 2019 at 3:56 AM Xiaoxiang Yu 
> wrote:
>
>> Hi andras,
>>
>> I would like to share what I find.
>>
>> There are several step in query execuation, you can find some log which
>> can indicate the start an end of these steps, that will help to you find
>> which step cost a lot time.
>>
>>
>>
>> *Step 1. Check SQL ACL and grammar; and then use Calicate to parse SQL
>> into AST and finally into an execution plan.*
>>
>>
>>
>>*Start of step 1:*
>>
>> service.QueryService:414 : The original query:
>>
>>End of step 1:
>>
>>  enumerator.OLAPEnumerator:105 : query storage...
>>
>> *Step 2. Send RPC request to region server or streaming receiver.*
>>
>>
>>
>>*Start of send RPC request to Region Server.*
>>
>>   2019-11-28 20:01:53,273 INFO  [Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseEndpointRPC:165 : The
>> scan *30572c87* for segment UAC[20191127151000_20191128012000] is as
>> below with 1 separate raw scans, shard part of start/end key is set to 0
>>
>> 2019-11-28 20:01:53,275 INFO  [Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseRPC:288 : Visiting
>> hbase table *lacus:LACUS_AGEAMBMTM1*: cuboid require post aggregation,
>> from 89 to 127 Start:
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00
>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00n\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00)
>> Stop:
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFF\x6E\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00
>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFFn\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00)
>> Fuzzy key counts: 1. Fuzzy keys :
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>> \x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01;
>>
>>
>>
>>*End of send RPC request to Region Server.*
>>
>> 2019-11-28 20:02:10,751 INFO  [kylin-coproc--pool2-t4]
>> v2.CubeHBaseEndpointRPC:343 : > 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a GTScanRequest *30572c87*>Endpoint
>> RPC returned from HTable lacus:LACUS_AGEAMBMTM1 Shard
>> \x6C\x61\x63\x75\x73\x3A\x4C\x41\x43\x55\x53\x5F\x41\x47\x45\x41\x4D\x42\x4D\x54\x4D\x31\x2C\x2C\x31\x35\x37\x34\x39\x30\x36\x31\x35\x35\x35\x39\x33\x2E\x39\x64\x39\x30\x39\x36\x31\x64\x66\x61\x38\x37\x31\x37\x35\x38\x36\x37\x33\x37\x65\x33\x36\x30\x36\x31\x35\x34\x34\x36\x36\x33\x2E
>> on host: cdh-worker-2.*Total scanned row: 770235*. Total scanned bytes:
>> 47587866. Total filtered row: 0. Total aggred row: 764215. *Time elapsed
>> in EP: 17113(ms)*. Server CPU

Re: how does cube retention range work

2019-12-06 Thread ShaoFeng Shi

Hi kangsen,

It will be triggered when a new segment is built, see CubeService.
updateOnNewSegmentReady(), line 637.

If a cube segment's all date is older than the retention days (say all
before 30 days; if partial, it will not be selected), it will be dropped
from the segment list first. The data (hdfs, HBase) cleanup will be
deferred to StorageCleanupJob time.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Lu, Kang-Sen  于2019年12月6日周五 上午12:01写道：

> I am running kylin 2.6.3. In kylin GUI configuring cube, at step “Refresh
> Setting”, we can specify “Retention Threshold”, say, 30 (days).
>
>
>
> How would kylin automatically remove cube segments that is older than 30
> days?
>
>
>
> I searched kylin source code, it seems that kylin does save
> “retentionRange” with each CubeDesc. But no other source code refers to
> that retentionRange.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
> --
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> --
>

Re: Error on EMR

2019-12-06 Thread ShaoFeng Shi

Hi Tanmay,

Could you share how did you fix the problem? We can update the document if
it missed something. Thanks!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Xiaoxiang Yu  于2019年12月6日周五 上午10:28写道：

> Hi Tanmay,
>
>Thank you for your update, and I am glad to hear that you have finally
> fixed your issue.
>
>
>
> 
>
> Best wishes,
>
> Xiaoxiang Yu
>
>
>
>
>
> *发件人**: *Tanmay Movva 
> *日期**: *2019年12月6日 星期五 01:51
> *收件人**: *Xiaoxiang Yu 
> *主题**: *Re: Error on EMR
>
>
>
> Hey Xiaoxiang,
>
>
>
> Thank you so much. This worked for me. Also there is one mistake in your
> export hive_dependency, spelling error, it should be HBASE instead of HBSE.
> Probably while putting it on github. Haha. Thanks
>
>
>
> On Wed, Dec 4, 2019 at 10:27 AM Xiaoxiang Yu 
> wrote:
>
> This is my install with some additional steps, please check this:
> https://github.com/hit-lacus/hit-lacus.github.io/issues/76#issuecomment-548255402
> , and I didn't met the same problem as you, so I never copy any hive jar
> into Kylin from EMR env.
>
>
>
> Besides, I am using emr-5.27 in region cn-northwest-1. If you use a
> different version, maybe the problem you faced cannot fixed by my steps.
>
>
>
> 
>
> Best wishes,
>
> Xiaoxiang Yu
>
>
>
>
>
> *发件人**: *Tanmay Movva 
> *日期**: *2019年12月4日 星期三 12:25
> *收件人**: *Xiaoxiang Yu 
> *主题**: *Re: Error on EMR
>
>
>
> Hi,
>
>
>
> Can you share your hive conf and kylin conf changes that you have made. I
> was able to install and setup kylin and run some .sh files. But I get the
> class not found error at stage 2 while building sample cube. I probably am
> missing some hive jar in classpath, but then I haven't made any significant
> changes to conf. So not able to debug.
>
>
>
> On Tue, Dec 3, 2019 at 8:11 PM Xiaoxiang Yu 
> wrote:
>
> Hi,
>I have successfully deployed latest version of Kylin(3.0.beta) on AWS
> EMR 5.27 and build a few cubes successfully, maybe you can have a try?
>The cluster is created by CLI looks like this, and I deployed Kylin on
> MASTER node:
>
> aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig
> Name=Spark Name=Sqoop Name=Tez Name=Zeppelin Name=ZooKeeper Name=Ganglia\
> --release-label emr-5.27.0 \
> --instance-groups
> '[{"InstanceCount":4,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":200,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.2xlarge","Name":"Worker
> Cluster"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"c4.4xlarge","Name":"MasterQuery"}]'
> \
> --configurations
> '[{"Classification":"hdfs-site","Properties":{"dfs.replication":"2"}}]' \
> --ebs-root-volume-size 100 \--enable-debugging \
> --name 'BenchmarkCluster' \
> --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
> --region cn-northwest-1
>
> 
> Best wishes,
> Xiaoxiang Yu
>
>
> 在 2019/12/2 20:38，“Tanmay Movva” 写入:
>
> Hello,
>
> We have installed kylin on our EMR master along with hbase, hadoop and
> hive. Using download-spark.sh from KYLIN_HOME/bin I have installed
> spark.
> As mentioned in "Install KYLIN on AWS EMR" guide we have followed the
> steps
> to configure Kylin working dir and hbase storage as S3 and also made
> the
> necessary zkquorum changes.
>
> When we run the sample.sh or check-env.sh we don't get any errors. But
> when
> we run the cube build job from UI, the job fails at stage-2
> "Redistribute
> Flat Hive Tables". As the job "Create Intermediate Hive tables" has
> been
> completed successfully I don't think there has been any error with
> Hive.
>
> Can anyone help us with this? Thank You.
>
>
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/H

Re: Releasing Apache Kylin v3.0-GA

2019-12-06 Thread ShaoFeng Shi

Looking forward to the 3.0 GA release; Many users already asked me in
private. They want to try this asap.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Andras Nagy  于2019年12月7日周六 上午12:54写道：

> Great news, and congratulations to the realtime OLAP team to achieving GA
> status of the feature!
>
> And of course, to everyone else working on the release, but I'm really
> grateful to Xiaoxiang for the help he offered us in realtime OLAP.
>
>
>
> On Fri, Dec 6, 2019 at 4:11 AM Xiaoxiang Yu 
> wrote:
>
>> Good news, I cannot wait to the next generation of Kylin.
>>
>>
>>
>> 
>>
>> Best wishes,
>>
>> Xiaoxiang Yu
>>
>>
>>
>>
>>
>> *发件人**: *George Ni 
>> *答复**: *"user@kylin.apache.org" 
>> *日期**: *2019年12月6日 星期五 10:25
>> *收件人**: *"user@kylin.apache.org" , "
>> d...@kylin.apache.org" 
>> *主题**: *Releasing Apache Kylin v3.0-GA
>>
>>
>>
>> Hi Community,
>>
>>
>>
>> As we have released v3.0-alpha, v3.0-alpha2, v3.0-beta, we have enough
>> cofidence to
>>
>> release the GA version for v3.0 next week, and I’m planning to create a
>> branch for its release.
>>
>>
>>
>> Detail features, improvements and bug fixes will come later, the main
>> features are:
>>
>> 1. Realtime OLAP
>>
>> 2. Job scheduler with Apache Curator
>>
>> 3. User and user group management
>>
>>
>>
>> Please feel free to leave your comments here.
>>
>>
>>
>> -
>>
>> Best regards,
>>
>>
>>
>> Ni Chunen / George
>>
>

Re: Re: kylin

2019-12-02 Thread ShaoFeng Shi

Hi,

I think TIME is a keyword in Calcite SQL; Please use "TIME" (with the
double quotation) to escape (用大写加双引号进行转义）.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




肖培栋  于2019年12月2日周一 下午5:18写道：

> 在我执行查询的时候，出现这个问题，别的字段可以查询，time这个字段不行（类型是timestamp）
> SQL: SELECT TIME,COUNT(*) FROM WX_CLICKS GROUP BY TIME
> User: ADMIN
> Success: false
> Duration: 0.009
> Project: kylin_qrcode
> Realization Names: []
> Cuboid Ids: []
> Total scan count: 0
> Total scan bytes: 0
> Result row count: 0
> Accept Partial: true
> Is Partial Result: false
> Hit Exception Cache: false
> Storage cache used: false
> Is Query Push-Down: false
> Is Prepare: false
> Trace URL: null
> Message: Encountered "TIME ," at line 1, column 8. Was expecting one of:
>   "UNION" ... "INTERSECT" ... "EXCEPT" ... "MINUS" ...
> "ORDER" ... "LIMIT" ... "OFFSET" ... "FETCH" ... "STREAM"
> ... "DISTINCT" ... "ALL" ... "*" ... "+" ... "-" ...
>   "NOT" ... "EXISTS" ...  ...
>  ...  ...
>  ...  ...
>  ...  ... "TRUE" ...
> "FALSE" ... "UNKNOWN" ... "NULL" ...  ...
>  ...  ... "DATE" ... "TIME"
>  ... "TIMESTAMP" ... "INTERVAL" ... "?" ...
> "CAST" ... "EXTRACT" ... "POSITION" ... "CONVERT" ...
> "TRANSLATE" ... "OVERLAY" ... "FLOOR" ... "CEIL" ...
> "CEILING" ... "SUBSTRING" ... "TRIM" ... "CLASSIFIER" ...
> "MATCH_NUMBER" ... "RUNNING" ... "PREV" ... "NEXT" ...
>  ... "MULTISET" ... "ARRAY" ... "PERIOD" ...
> "SPECIFIC" ...  ...  ...
>  ...  ...
>  ... "ABS" ... "AVG" ...
> "CARDINALITY" ... "CHAR_LENGTH" ... "CHARACTER_LENGTH" ...
> "COALESCE" ... "COLLECT" ... "COVAR_POP" ... "COVAR_SAMP" ...
>   "CUME_DIST" ... "COUNT" ... "CURRENT_DATE" ... "CURRENT_TIME"
> ... "CURRENT_TIMESTAMP" ... "DENSE_RANK" ... "ELEMENT" ...
> "EXP" ... "FIRST_VALUE" ... "FUSION" ... "GROUPING" ...
> "HOUR" ... "LAG" ... "LEAD" ... "LAST_VALUE" ... "LN" ...
>   "LOCALTIME" ... "LOCALTIMESTAMP" ... "LOWER" ... "MAX" ...
>   "MIN" ... "MINUTE" ... "MOD" ... "MONTH" ... "NTILE" ...
> "NULLIF" ... "OCTET_LENGTH" ... "PERCENT_RANK" ... "POWER"
> ... "RANK" ... "REGR_SXX" ... "REGR_SYY" ... "ROW_NUMBER"
> ... "SECOND" ... "SQRT" ... "STDDEV_POP" ... "STDDEV_SAMP"
> ... "SUM" ... "UPPER" ... "TRUNCATE" ... "USER" ...
> "VAR_POP" ... "VAR_SAMP" ... "YEAR" ... "CURRENT_CATALOG" ...
>   "CURRENT_DEFAULT_TRANSFORM_GROUP" ... "CURRENT_PATH" ...
> "CURRENT_ROLE" ... "CURRENT_SCHEMA" ... "CURRENT_USER" ...
> "SESSION_USER" ... "SYSTEM_USER" ... "NEW" ... "CASE" ...
> "CURRENT" ... "CURSOR" ... "ROW" ... "(" ...
> ==[QUERY]===
>
> 2019-12-02 17:18:47,293 ERROR [http-bio-7070-exec-8]
> controller.BasicController:63 :
> org.apache.kylin.rest.exception.InternalErrorException: Encountered "TIME
> ," at line 1, column 8. Was expecting one of: "UNION" ...
> "INTERSECT" ... "EXCEPT" ... "MINUS" ... "ORDER" ...
> "LIMIT" ... "OFFSET" ... "FETCH" ... "STREAM" ...
> "DISTINCT" ..

Re: kylin

2019-12-02 Thread ShaoFeng Shi

Hi xiaodong,

In the FAQ page (https://kylin.apache.org/docs/gettingstarted/faq.html),
there is a similar question:

"The query result is not exactly matched with that in Hive, what’s the
possible reason?"

You can search and check that if not. Hope it can help.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




肖培栋  于2019年12月2日周一 下午4:42写道：

>
> kylin的cube构建之后，生成的数据条数和我原表hive中的数据条殊不一致，怎么查找原因？？？
>

[ANNOUNCE] Please welcome Chunen Ni to the Apache Kylin PMC

2019-11-30 Thread ShaoFeng Shi

On behalf of the Apache Kylin PMC, I am pleased to announce that Chunen Ni
has accepted our invitation to become a PMC member on the Kylin project. We
appreciate Chunen stepping up to take more responsibility in the Kylin
project.

Please join me in welcoming Chunen to the Kylin PMC!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: Unable to create a grouping

2019-11-30 Thread ShaoFeng Shi

I see you already got the reason, from the StackOverflow post. Some
additional info from my side: when defining the data model, Kylin will
automatically add PK/FK to the dimension list. That explains why you
explicitly removed STORESALES.ID but it still appears as a dimension. When
adding dimension from the lookup table, please notice there is a "normal"
or "derived" option. If "derived", only the "hosting" dimension (FK) will
be grouped in the cube. If you want the cube materialize the groupping by
that dimension, please set it as "normal" dimension.

Enjoy Kylin!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Lev Bronshtein  于2019年11月12日周二 上午9:30写道：

> Hello Kylin users please let me know if you have any information for this
> issue
>
> I am running Apache Kylin apachekylin/apache-kylin-standalone:3.0.0-alpha2
> Docker image.  I started out by creating two Hive tables one to record
> store sales and one consisting of store metadata
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *CREATE TABLE IF NOT EXISTS STORESALES (id INT,food FLOAT,drugs
> FLOAT,cosmetic FLOAT,baby FLOAT,reportdate DATE);CREATE TABLE IF NOT EXISTS
> STOREMETA (id INT,address STRING,brand STRING,owner STRING);*I then
> created a model in which I declared *STORESALES* as my fact table and
> *STOREMETA* as a lookup table with left join *STORESALES.ID
> <http://STORESALES.ID> = STOREMETA.ID <http://STOREMETA.ID>* I then
> declare
>
>
>- STOREMETA.ID
>- STOREMETA.ADDRESS
>- STOREMET.OWNER
>- STOREMETA.BRAND
>
>
> as dimensions. I explicitly deleted *STORESALES.ID <http://STORESALES.ID>*
> I also specify measures
>
>
>- STORESALES.DRUGS
>- STORESALES.BABY
>- STORESALES.COMSETICS
>- STORESALES.FOOD
>
>
> and also specified *STORESALES.REPORTDATE* as my partition
>
> So then I go on to set up my cube. Again I add *STOREMETA[ID, BRAND,
> OWNER, NAME]* as dimensions, but for some reason *STOREDATA.ID
> <http://STOREDATA.ID>* shows up as a choice for a dimension as well. I
> add measures as *MAX_FOOD, MAX_DRUGS, MAX_COSMETICS, MAX_BABY*. The
> issues is once I get to Advanced Settings the only options for grouping
> available are *STORESALES.ID <http://STORESALES.ID>*. If I manually enter
> anything else it disappears from the list. I went back to edit the model
> and noticed that *STORESALES.ID <http://STORESALES.ID>* is now in the
> list of measures as well.
>
> Not sure if this is what breaking things for me. Or if my general lack of
> experience here is hindering my progress. Please assist.
>
> P.S. This question has also been posted to stack overflow, feel free to
> respond and engage there.  It contains screenshots I felt may be relevant
> https://stackoverflow.com/questions/58649747/unable-to-create-a-grouping-in-apache-kylin
>

Re: why kylin.engine.mr.config-override not work

2019-11-30 Thread ShaoFeng Shi

Are the two cubes very similar to each other? You mentioned that the bigger
cube has one more step "extract dictionary", does that cube has a "count
distinct (precisely)" measure, but the other doesn't have? Which job step
didn't have the "mapreduce.input.fileinputformat.split.minsize" parameter
take effective? Please provide some more specific and detailed info, so
that the developer can quickly identify the problem. Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




lk_hadoop  于2019年11月12日周二 下午12:20写道：

> can any body give me some clue?
>
> 2019-11-12
>
> lk_hadoop
>
>
>
> 发件人："lk_hadoop"
> 发送时间：2019-11-11 21:52
> 主题：why kylin.engine.mr.config-override not work
> 收件人："user","dev"
> 抄送：
>
> hi,all:
> I have two cube , both config wirh
> kylin.engine.mr.config-override.mapreduce.input.fileinputformat.split.minsize
> = 1073741824 , one cube with less data work with this property , the other
> with more data not work . I found that the cube with more data have one
> more step than the cue with less data. the step name is "Extract Dictionary
> from Global Dictionary" . I want to know why the property I override is not
> work.
> 2019-11-11
>
>
> lk_hadoop

Re: 回复： The results return some fields empty. But when I tried in hive with the same query, I got what I wanted to get.

2019-11-30 Thread ShaoFeng Shi

Hi Qiuyin,

I see this problem also be reported by another user in another thread. And
Xiaoxiang has replied that currently there is a limitation in Kylin for
Hadoop 3 (as well as HDP 3.1): if the lookup table in Hive enabled the ACID
feature, Kylin couldn't load that lookup table into Kylin, which caused the
query problem. You may need to wait a while for the resolution, or you can
try to disable ACID for that table.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




梅秋莹 <3281438...@qq.com> 于2019年11月14日周四 下午3:24写道：

> Thank you for your reply!
>
>Our installation environment is HDP 3.1.0.0. And the dimension
> tables also can be updated. Is it this model and environment induce the
> problem? I will appreciate it very much if you konw the solution.
>
>
> Best wishes,
>
> Qiuying Mei
>
> -- 原始邮件 --
> *发件人:* "Xiaoxiang Yu";
> *发送时间:* 2019年11月14日(星期四) 下午3:02
> *收件人:* "user@kylin.apache.org";"梅秋莹"<
> 3281438...@qq.com>;
> *主题:* Re: 回复： The results return some fields empty. But when I tried in
> hive with the same query, I got what I wanted to get.
>
> Hi friend,
>
>Do you deploy you Kylin on Hadoop3 env like HDP3? And are your
> dimension table an ACID transaction table?
>
>
>
> 
>
> Best wishes,
>
> Xiaoxiang Yu
>
>
>
>
>
> *发件人**: *梅秋莹 <3281438...@qq.com>
> *答复**: *"user@kylin.apache.org" 
> *日期**: *2019年11月13日 星期三 19:19
> *收件人**: *user 
> *主题**: *回复： The results return some fields empty. But when I tried in
> hive with the same query, I got what I wanted to get.
>
>
>
> Firstly, thank you for your reply!
>
>
>
> In my cube, I have set SHOP_NAME as a derived dimension. And I also found
> that all normal dimensions can return correctly, but not all  derived
> dimensions returned results.
>
>
>
>
>
>
>
> -- 原始邮件 --
>
> *发件人:* "Yaqian Zhang";
>
> *发送时间:* 2019年11月13日(星期三) 晚上6:39
>
> *收件人:* "user";
>
> *主题:* Re: The results return some fields empty. But when I tried in hive
> with the same query, I got what I wanted to get.
>
>
>
> Hi:
>
> Did you add SHOP_NAME to the dimension when you create cube? If not, kylin
> will not save the details of the original table when building the cube, and
> will return null when querying the column.
>
> > 在 2019年11月13日，14:01，梅秋莹 <3281438...@qq.com> 写道：
> >
> > 1. Environment:
> > kylin 2.6.3
> >
> > 2. Question:
> >
> >In kylin Insight interface, I input query clause like this ：
> >
> > select DATE_DIM."DATE", SHOP_DIM.SHOP_NAME,A.DISTRICT_ID,SUM(TOTALFEE)
> from DW_ERP_BY_SHOPSALE_FACT
> >
> > inner join DATE_DIM
> >
> > on
> >
> > DW_ERP_BY_SHOPSALE_FACT.DATE_KEY = DATE_DIM.DATE_KEY
> >
> > inner join TIME_DIM
> >
> > on
> >
> > DW_ERP_BY_SHOPSALE_FACT.TIME_KEY = TIME_DIM.TIME_KEY
> >
> > inner join SHOP_DIM
> >
> > on
> >
> > DW_ERP_BY_SHOPSALE_FACT.SHOP_KEY=SHOP_DIM.SHOP_KEY
> >
> > inner join ADMINISTRATIVE_DISTRICT A
> >
> > on
> >
> > SHOP_DIM.AREAID=A.DISTRICT_ID
> >
> > inner join ERP_BY_DEPARTMENT
> >
> > on
> >
> > SHOP_DIM.REGION=ERP_BY_DEPARTMENT.ID
> >
> > GROUP BY DATE_DIM."DATE",SHOP_DIM.SHOP_NAME,A.DISTRICT_ID.
> >
> >  The results return some fields empty. But when I tried in hive with the
> same query, I got what I wanted to get.
> >
> > 3. Please help me solve the quesiton if you have some solutions, thank
> you!
> >
> > <2cffc...@37eb5831.2f9ccb5d.jpg>
> >
> >
> > <2e01c...@ad604224.2f9ccb5d.jpg>
> >
> >
>

Re: kylin多个子查询嵌套导致查询速度严重变慢(几十秒)

2019-11-30 Thread ShaoFeng Shi

Hi Johnson,

This is a good observation. For a single cube's query, Kylin can dispatch
that to HBase for in parallel computing, which usually is sub-seconds. When
there are multiple sub-queries
together, Kylin will execute these sub-queries in sequence. After getting
the result back to the Kylin query node (in memory),  it will execute the
final round computing like joining, filtering, etc.
Depends on the data size, the time it takes varies. That is what you
observed.

A workaround is, you can define a new model/cube with the tables together.
Then no need to use sub-queries.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Johnson  于2019年11月15日周五 下午6:14写道：

> 由于前端需要将多个指标合并展示，故查询kylin时，一个sql可能会嵌套多个多个子查询。生产环境发现，子查询严重影响查询速度。大家知道什么优化方法吗？
> 测试如下：
> 1.计算活跃设备
> select count(distinct deviceid) dad from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_DEVICE_ACTIVE
> where par_dt>='2019-06-01' and par_dt<='2019-11-15'
> group by par_dt
> 2.计算活跃用户
> select g.par_dt,count(distinct g.userid) "activeAccount" from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_USER_MULTIDIM_ACTIVE g
> where g.par_dt>='2019-06-01' and g.par_dt<='2019-11-01'
> group by g.par_dt
> 分别执行两个sql耗时基本在0.5s左右。
>
> 3.采用子查询：
> select g.par_dt,count(distinct g.userid) "activeAccount",a.dad
> "activeDevice"
> from KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_USER_MULTIDIM_ACTIVE g
> left join(
> select par_dt,count(distinct deviceid) dad from
> KYLIN_VIEW.KYLIN_VIEW_T_DWA_ACT_XXX_DEVICE_ACTIVE
> where par_dt>='2019-06-01' and par_dt<='2019-11-01'
> group by par_dt
> ) a
> on g.par_dt = a.par_dt
> where g.par_dt>='2019-06-01' and g.par_dt<='2019-11-01'
> group by g.par_dt,a.dad
> 查询耗时多达1.5s，*当子查询增多时，查询耗时高达几十秒*。如下图，对于此结果不是很理解，因为子查询完全可以并行去查，然后子查询的结果（数据量已经超级小了）在kylin
> server端聚合，应该很快啊，为什么会这么慢？
>
>
>
>

Re: Need advice and best practices for Kylin query tuning (not cube tuning)

2019-11-30 Thread ShaoFeng Shi

Hi Andras,

Is this specific to real-time streaming, or it is also related to the
normal batch cube? I want to narrow down the scope. Besides, how complex
the queries are? I think that may need more inputs. If you can bring some
sample queries that would help to some extend.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Andras Nagy  于2019年11月28日周四 下午9:06写道：

> Dear All,
>
> We are troubleshooting slow queries in our Kylin deployment, and we
> suspect that the issue is not with the cube definitions, but with our
> queries. At least we have some quite complex queries with a lot of range
> checks on dimension values, and we have observed different response times
> by changing the queries to alternative, but functionally equivalent ones.
>
> Although it's hard to come to conclusions because we see a large variance
> in query response times (for the same query in the same environment, at
> roughly the same time).
> We have disabled query caching in kylin.properties
> (kylin.query.cache-enabled=false) to be able to have more conclusive
> results on what effect certain changes have on query execution time, but we
> still observe variance in query results on an environment that otherwise
> has no load. Perhaps this is due to caching within HBase or within the
> streaming receiver.
>
> Do you have any guidelines, best practices, documentation on how to tune
> queries for Kylin? (I'm aware of some cube tuning guidelines from the Kylin
> documentation, but now I'm looking for advice specifically about query
> optimization.)
>
> Many thanks,
> Andras
>

Re: Fw: kylin reload hive table

2019-11-30 Thread ShaoFeng Shi

Not missing jar, but the jar is incompatible. What's your Hadoop
distribution and version? Please check whether there are different version
of hive-metastore-.jar on your Kylin node.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




肖培栋  于2019年11月29日周五 下午5:39写道：

>
>
> 肖培栋
> 邮箱：xiaopeidong1...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=%E8%82%96%E5%9F%B9%E6%A0%8B=15192081379%40163.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Axiaopeidong1990%40163.com%22%5D>
>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制
>
> - Forwarded Message -
> From: 肖培栋 
> Date: 11/29/2019 17:38
> To: 15192081379 <15192081...@163.com>
> Subject: kylin reload hive table
> 在reload table的时候：
>
>
>
> 这是少jar包吗？？
>

Re: kylin构建立方体时总是自动关闭

2019-11-30 Thread ShaoFeng Shi

Please check the error log in the backend. Besides, please also run
metadata cleanup job periodically, if there are many history jobs.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




codingfor...@126.com  于2019年11月30日周六 上午9:52写道：

> Please check the error/exception information in the log file
> *$KYLIN_HOME/logs/kylin.log*
>
>
> 在 2019年11月29日，20:48，wangweilin  写道：
>
> 在构建立方体时总出现这个错误，刷新后页面打不开，需要重新启动kylin。
> kylin版本2.6.3
> hadoop 2.7.1
> hbase 1.2.5
> hive 1.2.1
> 
>
> wangweilin
> sdnuwangwei...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=wangweilin=sdnuwangweilin%40163.com=http%3A%2F%2Fmail-online.nosdn.127.net%2Fsm34b3cc94dce5f170cfe6377531ec0cf4.jpg=%5B%22sdnuwangweilin%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
>
>

Re: kafak数据老孺到kylin，采用spark引擎进行计算存在问题

2019-11-28 Thread ShaoFeng Shi

The pic 1 shows the spark job is running, no exception there. If you
noticed a spark job was failed, please start the spark history server to
check more logs there. Please remember to use the same log folder
(/kylin/spark-history) as kylin.properties.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2019年11月21日周四 上午9:54写道：

> Hi:
>
> The figure 1 does not seem to reflect any error information, could you
> provide more error log?
>
> And which step this error occur?
>
> 在 2019年11月21日，00:27，gaofeng5...@capinfo.com.cn 写道：
>
> 
>
>
>
> 
>
>
> 以上的spark的配置在cube构建的时候采用官方的kafka数据到kylin，总是存在图一的问题，这个怎么解决，谢谢。
> --
> gaofeng5...@capinfo.com.cn
>
>
>

Re: metastore clean OutOfMemoryError

2019-11-26 Thread ShaoFeng Shi

Thanks to Temple for the sharing!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




MrWell  于2019年11月22日周五 下午5:03写道：

> Hi Temple Zhou,
>
> Thanks!!!   i have solved it.  I think u r right.
>
>
> -- 原始邮件 --
> *发件人:* "Temple Zhou";
> *发送时间:* 2019年11月22日(星期五) 下午4:51
> *收件人:* "user";
> *主题:* Re: metastore clean OutOfMemoryError
>
> Hi MrWell,
> You should set the KYLIN_EXTRA_START_OPTS instead of KYLIN_JVM_SETTINGS
>
> exec hbase ${KYLIN_EXTRA_START_OPTS}
>> -Dkylin.hive.dependency=${hive_dependency}
>> -Dkylin.hbase.dependency=${hbase_dependency}
>> -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-tools-log4j.properties
>> "$@"
>
>
> It is *HBase Client *that became OOM because of the default small Java
> memory size.
>
> So, you can "export HBASE_OPTS="-Xmx??"" before executing
> "bin/metastore.sh clean --delete true"
> If you are working with CDH HBase, you can increase the "-Xmx268435456" in
> /etc/hbase/conf/hbase-env.sh directly.
>
> On Fri, Nov 22, 2019 at 4:30 PM MrWell  wrote:
>
>> Hi Shaofeng Shi,
>>
>> I have set KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g" in setenv.sh, but it
>> fail. Can metadata be larger than 16g?
>>
>> Thanks for reply.
>>
>> -- 原始邮件 --
>> *发件人:* "ShaoFeng Shi";
>> *发送时间:* 2019年11月22日(星期五) 下午3:18
>> *收件人:* "user";
>> *主题:* Re: metastore clean OutOfMemoryError
>>
>> hi Huangpeng,
>>
>> I guess your JVM heap is small or your Kylin metadata is big. You can try
>> to increase the java heap in "bin/setenv.sh"
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>>
>> MrWell  于2019年11月22日周五 下午3:01写道：
>>
>>> Hi, Kylin Team.
>>>
>>> When I execute "bin/metastore.sh clean --delete true" , I get a
>>> "OutOfMemoryError" like this
>>>
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid4839.hprof ...
>>> Heap dump file created [317991670 bytes in 2.120 secs]
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 4839"...
>>> bin/metastore.sh: line 109:  4839 Killed
>>> ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.MetadataCleanupJob "${@:2}"
>>>
>>>
>>> I have set 'setenv.sh' file, like this
>>>
>>> export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m
>>> -XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4
>>> -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled
>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>> -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation
>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
>>>
>>> Dose it means heap memory is still small?
>>>
>>

Re: metastore clean OutOfMemoryError

2019-11-21 Thread ShaoFeng Shi

Excellent Marc!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Marc Wu -X (mawu2 - Insigma Hengtian at Cisco) 
于2019年11月22日周五 下午3:16写道：

> Hi MrWell,
>
>
>
> You can try to create a file named setenv-tool.sh in the $KYLIN_HOME/conf
> directory, and put the following content in the file, then execute your
> command. It may solve your issues.
>
> #!/bin/bash
>
> #
>
> # Licensed to the Apache Software Foundation (ASF) under one or more
>
> # contributor license agreements.  See the NOTICE file distributed with
>
> # this work for additional information regarding copyright ownership.
>
> # The ASF licenses this file to You under the Apache License, Version 2.0
>
> # (the "License"); you may not use this file except in compliance with
>
> # the License.  You may obtain a copy of the License at
>
> #
>
> #http://www.apache.org/licenses/LICENSE-2.0
>
> #
>
> # Unless required by applicable law or agreed to in writing, software
>
> # distributed under the License is distributed on an "AS IS" BASIS,
>
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
> # See the License for the specific language governing permissions and
>
> # limitations under the License.
>
> #
>
> export KYLIN_EXTRA_START_OPTS="-Xmx3072M"
>
>
>
>
>
> *From: *MrWell 
> *Reply-To: *"user@kylin.apache.org" 
> *Date: *Friday, November 22, 2019 at 15:01
> *To: *user 
> *Subject: *metastore clean OutOfMemoryError
>
>
>
> Hi, Kylin Team.
>
>
>
> When I execute "bin/metastore.sh clean --delete true" , I get a
> "OutOfMemoryError" like this
>
>
>
>
>
> java.lang.OutOfMemoryError: Java heap space
>
> Dumping heap to java_pid4839.hprof ...
>
> Heap dump file created [317991670 bytes in 2.120 secs]
>
> #
>
> # java.lang.OutOfMemoryError: Java heap space
>
> # -XX:OnOutOfMemoryError="kill -9 %p"
>
> #   Executing /bin/sh -c "kill -9 4839"...
>
> bin/metastore.sh: line 109:  4839 Killed
> ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.MetadataCleanupJob "${@:2}"
>
>
>
>
>
> I have set 'setenv.sh' file, like this
>
>
>
> export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m
> -XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4
> -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
>
>
>
> Dose it means heap memory is still small?
>

Re: metastore clean OutOfMemoryError

2019-11-21 Thread ShaoFeng Shi

hi Huangpeng,

I guess your JVM heap is small or your Kylin metadata is big. You can try
to increase the java heap in "bin/setenv.sh"

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




MrWell  于2019年11月22日周五 下午3:01写道：

> Hi, Kylin Team.
>
> When I execute "bin/metastore.sh clean --delete true" , I get a
> "OutOfMemoryError" like this
>
>
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid4839.hprof ...
> Heap dump file created [317991670 bytes in 2.120 secs]
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 4839"...
> bin/metastore.sh: line 109:  4839 Killed
> ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.MetadataCleanupJob "${@:2}"
>
>
> I have set 'setenv.sh' file, like this
>
> export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m
> -XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4
> -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
>
> Dose it means heap memory is still small?
>

[Announce] Apache Kylin 2.6.4 released

2019-10-15 Thread ShaoFeng Shi

The Apache Kylin team is pleased to announce the immediate availability of
the 2.6.4 release.

This is a bugfix release after 2.6.3, with 27 bug fixes and enhancements;
All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/download/

Apache Kylin is an open-source Distributed Analytics Engine designed to
provide SQL interface and multi-dimensional analysis (OLAP) on Apache
Hadoop, supporting extremely large datasets.

Apache Kylin lets you query massive dataset at sub-second latency in 3
steps:
1. Identify a star schema or snowflake schema data set on Hadoop.
2. Build Cube on Hadoop.
3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
or RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

[Announce] Apache Kylin 2.6.4 released

2019-10-12 Thread ShaoFeng Shi

The Apache Kylin team is pleased to announce the immediate availability of
the 2.6.4 release.

This is a bugfix release after 2.6.3, with 27 bug fixes and enhancements;
All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache Kylin's
download page: https://kylin.apache.org/download/

Apache Kylin is an open-source Distributed Analytics Engine designed to
provide SQL interface and multi-dimensional analysis (OLAP) on Apache
Hadoop, supporting extremely large datasets.

Apache Kylin lets you query massive dataset at sub-second latency in 3
steps:
1. Identify a star schema or snowflake schema data set on Hadoop.
2. Build Cube on Hadoop.
3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
or RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to report
problems, and to get involved, visit the project website at
https://kylin.apache.org/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: [!!Mass Mail][Probable spam]Re: sometimes need quite a long time when building cube

2019-10-06 Thread ShaoFeng Shi

Please check the file size of the intermediate hive table first; The file
size should be even after the "Redistribute" step. If not, please check the
columns that it redistributed by (the first three dimensions by default).

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Bryan Liu (CN)  于2019年10月7日周一 上午6:52写道：

> Hi Shaofeng
>It was in map phase. Thank you
>
> Bryan
>
>
> 在 2019年10月6日，22:03，ShaoFeng Shi  写道：
>
> Hi Bryan,
>
> What's the phase of the job in the second screenshot? map phase or reduce
> phase?
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> Bryan Liu (CN)  于2019年9月26日周四 下午3:37写道：
>
>> Dears,
>>
>>
>>
>>I am doing some testing with  Kylin now.  My Cube based on one source
>> table with about 60~70M rows of data for one month.
>>
>>Normally we build cube need about 25mins .
>>
>>But sometimes which need more than 3hours , usually in busy period.
>> When I am checking the MapReduce Jobs for cube building step 3(Extract Fact
>> Table Distinct Columns) , I found some Jobs just take several Seconds. But
>> some Jobs take quit a long time.
>>
>>
>>
>>   Please refer to screenshot as bellow.
>>
>>I think Hadoop do not have enough resource is one reason.  Meanwhile,
>> there should have some problem with Cube building step 2.  Seems the data
>> is non-equilibrium.
>>
>>   Could you please give me some advice ? thank you so much .
>>
>> 
>>
>> 
>>
>>

Re: Apache Kylin Meetup @Berlin, Oct. 24, 2019

2019-10-06 Thread ShaoFeng Shi

Kylin users in the middle of the Euro are all welcomed to join this meetup
in Berlin! It is the same day with ApacheCon Europe 2019, you can come
after the
ApacheCon.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




nichunen  于2019年9月27日周五 上午10:46写道：

> Hello Kylin users & developers,
>
>
>
> There will be a Kylin Meetup next month in Berlin, Germany. OLX group and
> Kyligence will share their use cases and experiences with Kylin.
>
>
>
> Date: Oct. 24, 2019
>
> Time: 7:00 PM - 8:30 PM
>
> Location: OLX Group
>
> Karl-Liebknecht Straße 29, 10178 Berlin
>
> 16th floor
>
> Language: English
>
> Fee: Free!
>
> Link for Registration: https://www.meetup.com/Apache-Kylin-Meetup-Berlin/
>
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
>

Re: Error when building cube

2019-10-06 Thread ShaoFeng Shi

Wuming, thank you for the information. If anyone has clue about this,
welcome to share with us, a pull request is welcomed.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




伍明  于2019年9月27日周五 下午1:43写道：

> Hello，
>Thank you for your reply.
>When I use EMR 5.7 and apache-kylin-2.5.0，I have not encountered that
> problem.
>
>
> -- 原始邮件 --
> *发件人:* "ShaoFeng Shi";
> *发送时间:* 2019年9月26日(星期四) 上午8:39
> *收件人:* "user";
> *主题:* Re: Error when building cube
>
> Both the first step and the second steps are using "hive -e " to execute
> the data extraction and redistribution. If they were executed on the same
> node, should be the same. Please resume the job to take a retry.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> Yaqian Zhang  于2019年9月25日周三 下午4:28写道：
>
>> Hi:
>> Can you provide the detailed error log information and your hive-site.xml?
>> The reason mabe is that hive.metastore.uris property is not set in
>> hive-site.xml.
>>
>> 在 2019年9月25日，15:54，伍明  写道：
>>
>> Hello all,
>> Env information is as follows:
>>AWS emr-5.26.0
>>  apache-kylin-2.6.3-bin-hbase1x
>>  I created the sample cube with bin/sample.sh. And when I built the
>> sample cube, in step 3, an execption occured which
>> is 
>> NoSuchObjectException(message:default.kylin_intermediate_kylin_sales_cube_0ea737d7_619f_5fea_fed8_2e601e4a39ea
>> table not found),
>> <3201d...@00cc7d68.4c1d8b5d.jpg>
>> Hive table exists,but kylin cannot access it successfully.
>> Can anyone give me a pointer where I should configure to solve the
>> problem?
>>
>> I am looking forward to someone’s reply.
>>
>>
>>

Re: [DISCUSS] Upgrade Hadoop-related dependencies’ to Hadoop3 for master branch

2019-10-06 Thread ShaoFeng Shi

+1

Source code release is the core asset that apache project releases, while
binary packages are just for convenience. Providing several binary versions
is difficult for the maintainer (currently for each release we made 4
binary packages). Now Kylin 3.0 is a good moment that we switch the default
Hadoop version to Hadoop 3.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Billy Liu  于2019年10月2日周三 上午9:29写道：

> +1. Kylin 3 aligns with Hadoop 3
>
> With Warm regards
>
> Billy Liu
>
> Luke Han  于2019年10月1日周二 下午1:24写道：
> >
> > +1, we should move on to next-g Hadoop
> >
> > Best Regards!
> > -
> >
> > Luke Han
> >
> >
> > On Mon, Sep 30, 2019 at 12:39 AM nichunen  wrote:
> >>
> >> Hi all,
> >>
> >>
> >> As more users upgrade their Hadoop to Hadoop3, to catch up with this
> trend, I suggest Kylin’s master branch upgrades the Hadoop-related
> dependencies’ version to Hadoop3.
> >>
> >>
> >> So Kylin 3.0-GA will based on Hadoop3, users may download packages
> which can be run on HDP 3.x and CDH 6.x. On the other side, branch 2.6.x
> will still based on Hadoop2. By the way, we should still maintain a branch
> of Kylin3.x for Hadoop2, and a branch of Kylin2.x for Hadoop3, so users may
> package Kylin binary packages by themselves.
> >>
> >>
> >>
> >> Best regards,
> >>
> >>
> >>
> >> Ni Chunen / George
> >>
> >>
>

Re: Error when building cube

2019-09-25 Thread ShaoFeng Shi

Both the first step and the second steps are using "hive -e " to execute
the data extraction and redistribution. If they were executed on the same
node, should be the same. Please resume the job to take a retry.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Yaqian Zhang  于2019年9月25日周三 下午4:28写道：

> Hi:
> Can you provide the detailed error log information and your hive-site.xml?
> The reason mabe is that hive.metastore.uris property is not set in
> hive-site.xml.
>
> 在 2019年9月25日，15:54，伍明  写道：
>
> Hello all,
> Env information is as follows:
>AWS emr-5.26.0
>  apache-kylin-2.6.3-bin-hbase1x
>  I created the sample cube with bin/sample.sh. And when I built the
> sample cube, in step 3, an execption occured which
> is 
> NoSuchObjectException(message:default.kylin_intermediate_kylin_sales_cube_0ea737d7_619f_5fea_fed8_2e601e4a39ea
> table not found),
> <3201d...@00cc7d68.4c1d8b5d.jpg>
> Hive table exists,but kylin cannot access it successfully.
> Can anyone give me a pointer where I should configure to solve the
> problem?
>
> I am looking forward to someone’s reply.
>
>
>

Please use English as the primary language

2019-09-03 Thread ShaoFeng Shi

Hello Kylin user and developers,

As an Apache project, the Kylin community has people all around the world:
Asia, Euro, America, and other regions; English should be the primary and
recommended language for wider discussion and communication.

This is not mandatory; If you have difficult to write English, another
language is also okay. Let's follow this as much as possible. Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org

Re: Cube building stuck at "Create Intermediate Flat Hive Table"

2019-09-02 Thread ShaoFeng Shi

Did you check the job in YARN resource manager? Is there any job pending
for submitting? Sometimes, if YARN doesn't have enough resource, the job
will be pending.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Khalil Mejdi  于2019年8月19日周一 下午11:22写道：

> Hello Kylin,
>
> I upgraded my dist to Ubuntu 19.04 from 18.04 and i re installed Kylin
> with his components, everything seems to be working fine BUT the build is
> stuck at 0.
>
> [image: image.png]
>
> Am I missing something?
>
>
>
>
> Khalil Mejdi
> Middle Developer
>
> [image: Logo] <https://www.smart-etech.tn/>
> T: +216 27 782 201
> khalilme...@istic.u-carthage.tn - www.smart-etech.tn
> Technopark Borj Cedria
> [image: Facebook icon] <https://www.facebook.com/eTechSmart/>  [image:
> LinkedIn icon] <https://www.linkedin.com/company/smart-etech>
>
>

Re: Re: query concurrency

2019-09-01 Thread ShaoFeng Shi

Yes, it depends on the specific scenario. If you can provide the testing
scenario, SQL queries, and detailed cube design, then we can discuss it.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




shicheng31...@gmail.com  于2019年9月2日周一 上午11:10写道：

>  But if each SQL  statement is different, how much can QPS reach?
> --
> shicheng31...@gmail.com
>
>
> *From:* ShaoFeng Shi 
> *Date:* 2019-09-02 10:52
> *To:* user 
> *Subject:* Re: query concurrency
> Hello,
>
> Usually, if the cube has been well designed and tunned, one Kylin server
> can support 50 to 150 QPS. From the Cisco team's experience, with enabling
> some advanced features like query cache and prepared statement, one Kylin
> server can support up to 500 QPS:
>
> https://kylin.apache.org/blog/2019/01/17/cisco-throughput-5x/
>
> If your QPS is not ideal, please try to tune a single query's performance
> and optimize the cube design.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> shicheng31...@gmail.com  于2019年8月26日周一 下午4:32写道：
>
>>
>> Hi:
>> According to kylin's introduction, an ordinary server can accept
>> tens to hundreds of QPS. However, I found in the actual production that the
>> QPS of a query node can only reach single digits. It is reasonable to say
>> that the concurrency of a SpringBoot project is relatively high, and the
>> resolution speed of Calcite does not stop there.HBase's reading speed is no
>> problem. But the actual effect is so abnormal, what could be the problem?
>>
>> --
>> shicheng31...@gmail.com
>>
>>

Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”

2019-09-01 Thread ShaoFeng Shi

This article can help, to some extend:

https://kylin.apache.org/docs/howto/howto_optimize_build.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




ITzhangqiang  于2019年9月2日周一 上午10:23写道：

> Hi Yaqian:
>
>Thanks fro your reply!
>
> I know what you said,but I want to know more detail.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Yaqian Zhang 
> *发送时间: *2019年9月1日 16:03
> *收件人: *user@kylin.apache.org
> *主题: *Re: Details about “Extract Fact Table Distinct Columns and Build
> Dimension Dictionary”
>
>
>
> Hi Johnson:
>
>In this step, kylin calculates the cardinality of the dimension
> column and builds a dictionary for the dimension column.
>
>In order to save space and improve efficiency, kylin encodes and
> compresses dimensions, and adopts dictionary coding technology by default.
> Dictionary encoding is to construct a mapping table from string to int for
> all the values under the dimension, and then serialize the dictionary to
> save, thus greatly reducing the size of the storage. The dictionary is in
> order. If string A is bigger than string B, the value of encoding A will be
> bigger than that of encoding B. This will enable the encoding value to be
> used in Hbase queries without decoding.
>
>However, since using dictionary encoding requires maintaining a
> mapping table, it is necessary to consider the dimension cardinality, which
> refers to the number of all the different values in the dimension column.
> If the cardinality of the dimension is very high, the dictionary will be
> very large, so it is not suitable for loading into memory. In this case,
> other encoding methods should be chosen. The maximum allowable limit for
> kylin dictionary coding is 5 million by default, which is configured by
> parameter kylin.dictionary.max.cardinality.
>
>
>
> On Aug 30, 2019, at 8:29 PM, Johnson  wrote:
>
>
>
> Hi，all：
>
> · I want to know the details of these two steps：Extract Fact
> Table Distinct Columns and Build Dimension Dictionary。What do these steps
> do and how to do？
>
> · looking forward to your reply
>
>
>
> --
>
> Best wishes,
>
> Johnson
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: query concurrency

2019-09-01 Thread ShaoFeng Shi

Hello,

Usually, if the cube has been well designed and tunned, one Kylin server
can support 50 to 150 QPS. From the Cisco team's experience, with enabling
some advanced features like query cache and prepared statement, one Kylin
server can support up to 500 QPS:

https://kylin.apache.org/blog/2019/01/17/cisco-throughput-5x/

If your QPS is not ideal, please try to tune a single query's performance
and optimize the cube design.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




shicheng31...@gmail.com  于2019年8月26日周一 下午4:32写道：

>
> Hi:
> According to kylin's introduction, an ordinary server can accept tens
> to hundreds of QPS. However, I found in the actual production that the QPS
> of a query node can only reach single digits. It is reasonable to say that
> the concurrency of a SpringBoot project is relatively high, and the
> resolution speed of Calcite does not stop there.HBase's reading speed is no
> problem. But the actual effect is so abnormal, what could be the problem?
>
> --
> shicheng31...@gmail.com
>
>

Re: Merge Job Java Heap Error

2019-09-01 Thread ShaoFeng Shi

Dictionary will be loaded into Kylin's memory, so based on your JVM heap
configuration, the dictionary couldn't be very big.

Usually, the dictionary can only support dimension whose cardinality is
less than 5 million (smaller is better). For larger cardinality, please use
"fixed_length" encoding or "integer" encoding (if the column is in int
format).

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




maoxiaomao  于2019年8月29日周四 下午3:02写道：

> Hi,
>I building a MERGE job, and it got the following error when Step1.
> Merge Dictionary. I merge 3 segment ,and my dict file size is 185.7M*3
> (557M)，it's really larger than normal.
> I wonder:
>Is there any size limit for dictionary？
>How can I make it's size smaller?
> env:
>Kylin 2.6.1
>
> Error: java.lang.OutOfMemoryError: Java heap space
> at org.apache.kylin.dict.TrieDictionary.readFields(TrieDictionary.java:341)
> at
> org.apache.kylin.dict.TrieDictionaryForest.readFields(TrieDictionaryForest.java:234)
> at
> org.apache.kylin.dict.DictionaryInfoSerializer.deserialize(DictionaryInfoSerializer.java:80)
> at
> org.apache.kylin.dict.DictionaryInfoSerializer.deserialize(DictionaryInfoSerializer.java:35)
> at
> org.apache.kylin.common.persistence.ContentReader.readContent(ContentReader.java:40)
> at
> org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:269)
> at
> org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:256)
> at
> org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:397)
> at
> org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:80)
> at
> org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:77)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at
> org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:101)
> at
> org.apache.kylin.engine.mr.steps.MergeDictionaryMapper.doMap(MergeDictionaryMapper.java:103)
> at
> org.apache.kylin.engine.mr.steps.MergeDictionaryMapper.doMap(MergeDictionaryMapper.java:66)
> at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
>
>
>
>
>
> --
> [image: 点击参加抽奖]
> <http://pepsi2009.163.com/validsign.php?valide=0ba9e7477bf1486aad82350cf9504362=lang--lang--l...@163.com>
>
>
>
>

Re: kylin 查询性能

2019-09-01 Thread ShaoFeng Shi

Yes, it is expected; Most aggregated queries can be returned at 1 second
level.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Xiaoxiang Yu  于2019年8月30日周五 下午2:21写道：

> Dear Katte
>
> For response time/benchmark of Kylin query, please refer to
> https://github.com/Kyligence/ssb-kylin &
> https://github.com/Kyligence/kylin-tpch .
>
>
>
> 
>
> Best wishes,
>
> Xiaoxiang Yu
>
>
>
>
>
> *发件人**: *Katte 
> *答复**: *"user@kylin.apache.org" 
> *日期**: *2019年8月30日 星期五 13:58
> *收件人**: *"user@kylin.apache.org" 
> *主题**: *kylin 查询性能
>
>
>
> Hi All,
>
>
>
> 公司要求我们项目组验证一下kylin的查询性能，我的结果如下:
>
>
>
> 1 个master, 2 个slave，配置均为4cpu, 16G内存 ubuntu 64bit
>
> 我通过hive加载了一张数据表 110个字段，12,000,000条记录,
>
> 创建cube有4个维度，10个度量，共用了*15分钟*
>
> 创建后执行了一条group by 语句，用了*1.06 秒*
>
>
>
> 想咨询一下根据上面的情况，这个创建时间与查询时间是否是正常的，谢谢！
>
>
>
>
>
>
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>

Re: Re:What dose Data Size and Source Table Size mean

2019-09-01 Thread ShaoFeng Shi

Thank you Xiaomao for the answer, it is good!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




lk_hadoop  于2019年8月30日周五 上午10:50写道：

> Thank you very much.
>
> 2019-08-30
> --
> lk_hadoop
> --
>
> *发件人：*maoxiaomao 
> *发送时间：*2019-08-29 15:49
> *主题：*Re:What dose Data Size and Source Table Size mean
> *收件人：*"user@kylin.apache.org"
> *抄送：*
>
> Hi lk,
>This is my understanding, I'm not quite sure about it. ( my kylin
> version v2.6.1 )
>   1. Data Size : for each segment, each mr step, it is the output data
> size, also it's one of the mapreduce counters. which can be seen in log  as
> "HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps)
>
>   2. Source Table Size : is the size of Source Data read as String in
> some point, and in the website it's a sum of each segment. Which is counter
> of Step#2. Extract Fact Table Distinct Columns named "BYTES" of class 
> "org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter"，it
> can be seen at the bottom of Step#2.log as follow.   it calculate as follow
>   In FactDistinctColumnsMapper.doMap:
>   and for hive the parseMapperInput work as :
>   also countSizeInBytes calculate as:
>
>  3. Cube Size : the finally HFile Size，"Data Size" of Step.Convert
> Cuboid Data to HFile, and in the website it's a sum of each segment.
>
>
>
>
> --
> At 2019-08-28 14:13:00, "lk_hadoop"  wrote:
>
> hi,all:
> I am not quite understand some index I saw on the kylin's web:
>
> #1 Step Name: Create Intermediate Flat Hive Table
> Data Size: 72.06 GB
> Duration: 6.07 mins Waiting: 0 seconds
>
> what is the "Data Size" mean ?  all the records data size *2 ?
>
>
> what is the "Source Table Size" mean?
>
>
> thanks for your attention.
>
> 2019-08-28
> --
> lk_hadoop
>
>
>
>
>
>

1 2 3 4 5 6 7 >

1 - 100 of 697 matches

Mail list logo