Re: [VOTE] Release apache-kylin-4.0.0 (RC1)

2021-08-25 Thread JiaTao Tao
+1

Regards!

Aron Tao


wangrupeng  于2021年8月25日周三 下午2:05写道:

> + 1  mvn clean install -DskipTests passed
>
> ---
> Best wishes,
> Rupeng Wang
>
>
>
> 
> 发件人: hit_la...@126.com  代表 Xiaoxiang Yu <
> x...@apache.org>
> 发送时间: 2021年8月24日 11:54
> 收件人: Kylin Dev 
> 主题: [VOTE] Release apache-kylin-4.0.0 (RC1)
>
> Hi all,
>
> I have created a build for Apache Kylin 4.0.0, release candidate 1.
>
> Changes highlights:
>
>
>
>
> KYLIN-4903 cache parent datasource to accelerate next layer's cuboid
> building
>
> KYLIN-4905 Support limit .. offset ... in spark query engine
>
> KYLIN-4925 Use Spark 3 as build and query engine for Kylin 4
>
> KYLIN-5011 Detect and scatter skewed data in dict encoding step
>
> KYLIN-4948 Provide an API to allow users to adjust cuboids manually
>
> KYLIN-5027 Add the config of whether to build base cuboid in kylin4
>
> KYLIN-5019 Avoid building global dictionary from all data of fact table
> each time
>
>
>
>
>
>
>
> Thanks to everyone who has contributed to this release.
>
> Here are the release notes:
>
> <
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12349382
> >
>
>
>
>
>
>
>
> The commit to being voted upon:
>
> <
> https://github.com/apache/kylin/commit/aa6d582c0a96fdf31e85791f86850cfc00a7644f
> >
>
> Its hash is aa6d582c0a96fdf31e85791f86850cfc00a7644f.
>
>
>
>
>
>
>
> The artifacts to be voted on, including the source package and two
>
> pre-compiled binary packages are located here:
>
> 
>
>
>
>
>
>
>
> The hash of the artifacts are as follows:
>
>
>
>
> apache-kylin-4.0.0-source-release.zip.sha256
>
> 788d0402756dad6e735d6b435b94e3c67f2b3c730f38f8c9b4d71bc66dd71703
>
>
>
>
> apache-kylin-4.0.0-bin-spark2.tar.gz.sha256
>
> 748c4916489e1e014eae81fe7e0cc72ef564573eb5882969b53feb2ac7ab9379
>
>
>
>
> apache-kylin-4.0.0-bin-spark3.tar.gz.sha256
>
> 2378c066ff990b70f9a5f4ee32f56ed4a4d86afc88f5986dfbb05fd8f9cc5946
>
>
>
>
>
>
>
> A staged Maven repository is available for review at:
>
> 
>
>
>
>
> Release artifacts are signed with the following key:
>
> 
>
>
>
>
>
>
>
> Please vote on releasing this package as Apache Kylin 4.0.0 .
>
>
>
>
> The vote is open for the next 72 hours and passes if a majority of
>
> at least three +1 binding votes are cast.
>
>
>
>
> [ ] +1 Release this package as Apache Kylin 3.1.2
>
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
>
>
> Here is my vote:
>
> +1 (binding)
>
> --
>
> Best wishes to you !
> From :Xiaoxiang Yu
>


Re: New PMC: Xiaoxiang Yu

2020-10-09 Thread JiaTao Tao
Congratulations!


Regards!

Aron Tao


ShaoFeng Shi  于2020年10月9日周五 下午3:35写道:

> The Project Management Committee (PMC) for Apache Kylin
> has invited Xiaoxiang Yu to become a PMC and we are pleased
> to announce that he has accepted.
>
> Xiaoxiang becomes the Kylin committer in late 2019; Since then, he plays a
> leadership role in the Kylin team, and actively coaches other contributors,
> including answer the questions, reviewing PRs, making the release,
> maintaining documents, etc. He also gave many sharings in the community.
>
> Being a PMC member enables assistance with the management
> and to guide the direction of the project.
>
> Congratulations to Xiaoxiang!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: [Announce] Apache Kylin 4.0.0-alpha released

2020-09-14 Thread JiaTao Tao
Congratulations!
I've been waiting for this day for a long time. Truly thanks for the
community's hard work.


Regards!

Aron Tao


George Ni  于2020年9月13日周日 下午7:57写道:

The Apache Kylin team is pleased to announce the immediate
availability of the 4.0.0-alpha release.

This is a major release after 3.1.0, with 35 new features and 22 bug
fixes. All of the changes in this release can be found in:
https://kylin.apache.org/docs/release_notes.html

You can download the source release and binary packages from Apache
Kylin's download page: https://kylin.apache.org/download/

Apache Kylin is an open-source Distributed Analytical Data Warehouse
for Big Data; it was designed to provide OLAP (Online Analytical
Processing) capability in the big data era. By renovating the
multi-dimensional cube and precalculation technology on Hadoop and
Spark, Kylin is able to achieve near-constant query speed regardless
of the ever-growing data volume. Reducing query latency from minutes
to sub-second, Kylin brings online analytics back to big data.

Apache Kylin lets you query billions of rows at sub-second latency in 3
steps:
1. Identify a Star/Snowflake Schema on Hadoop.
2. Build Cube from the identified tables.
3. Query using ANSI-SQL and get results in sub-second, via ODBC, JDBC
or RESTful API.

Thanks to everyone who has contributed to this release.

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kylin.apache.org/


-

Best regards,

Ni Chunen / George


Re: New committer: Xiaoxiang Yu

2019-12-30 Thread JiaTao Tao
Congratulations Xiaoxiang!

-- 


Regards!

Aron Tao

> 在 2019年12月29日,17:17,ShaoFeng Shi  写道:
> 
> Hi folks,
> 
> The Project Management Committee (PMC) for Apache Kylin
> has invited Xiaoxiang Yu to become a committer and we are pleased to
> announce that he has accepted.
> 
> Xiaoxiang Yu (俞霄翔, email hit_la...@126.com) is one of the big data
> engineers from Kyligence; He started to work on the Kylin project since the
> middle of 2018. In the past time, he fixed many issues, investigated and
> verified many new features (especially the v3.0 real-time streaming),
> enhancements and bug fixes. Thank you and congratulations, Xiaoxiang!
> 
> Let's warmly welcome Xiaoxiang as the Kylin committer!
> 
> Best regards,
> 
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
> 
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org



Re: [VOTE] Release apache-kylin-3.0.0 (RC1)

2019-12-10 Thread JiaTao Tao
+1

ShaoFeng Shi  于2019年12月10日周二 下午2:07写道:

> Hi all,
>
> I have created a build for Apache Kylin 3.0.0, release candidate 1.
>
> Changes highlights:
> [KYLIN-4258] - Real-time OLAP may return an incorrect result for some case
> [KYLIN-4167] - Refactor streaming coordinator
> [KYLIN-4273] - Make cube planner works for real-time streaming job
> [KYLIN-4187] - Building dimension dictionary using spark
> [KYLIN-4098] - Add cube auto-merge API
>
> Thanks to everyone who has contributed to this release.
> Here are the release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345005==12316121
>
> The commit to being voted upon:
>
> https://github.com/apache/kylin/commit/c75242a9b55fd57a3a58d92a2dfa9f21cfe4eebc
>
> Its hash is c75242a9b55fd57a3a58d92a2dfa9f21cfe4eebc.
>
> The artifacts to be voted on, including the source package and two
> pre-compiled binary packages are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-3.0.0-rc1/
>
> The hash of the artifacts are as follows:
> apache-kylin-3.0.0-source-release.zip.sha256
> 9224742a87750b8d127c5031c03f3716e3af732c9805a6d0c64871605704f6c0
> apache-kylin-3.0.0-bin-hbase1x.tar.gz.sha256
> bdeddee3eb453c139eabaa2ce7ebd5d14f72d5ac48e5a64636aba2ed7357dda9
> apache-kylin-3.0.0-bin-cdh57.tar.gz.sha256
> c2ae9498f61edbacb6dae5fc32e2c4ea14539ef6d906d53194492e042c80185f
> apache-kylin-3.0.0-bin-hadoop3.tar.gz.sha256
> 116ba002d794058bd34bd05989da2c3a7ff87cf67d3647d2f1cc5b5717d445f6
> apache-kylin-3.0.0-bin-cdh60.tar.gz.sha256
> 22a0701b5a03a8d40c8b1be4fe4acb1ff2550a18c52d509b592d59ef5a094f7e
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1070/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 3.0.0.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 3.0.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 

Regards!

Aron Tao


Re: [ANNOUNCE] Please welcome Chunen Ni to the Apache Kylin PMC

2019-12-01 Thread JiaTao Tao
Congratulations!

-- 

Regards!

Aron Tao


ShaoFeng Shi  于2019年12月1日周日 上午10:47写道:

> On behalf of the Apache Kylin PMC, I am pleased to announce that Chunen Ni
> has accepted our invitation to become a PMC member on the Kylin project. We
> appreciate Chunen stepping up to take more responsibility in the Kylin
> project.
>
> Please join me in welcoming Chunen to the Kylin PMC!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>


Re: New committer: Temple Zhou

2019-11-19 Thread JiaTao Tao
Congratulations


-- 

Regards!

Aron Tao

ShaoFeng Shi  于2019年11月16日周六 下午1:57写道:

> Hi folks,
>
> The Project Management Committee (PMC) for Apache Kylin
> has invited Temple Zhou to become a committer and we are pleased to
> announce that he has accepted.
>
> Temple Zhou (周天鹏, email dba...@gmail.com) is one of the big data engineers
> from DXY.cn (丁香园); He maintained the Kylin cluster with more than 1000
> tableau reports for DXY since early of 2018. He gave many valuable
> feedbacks and enhancements to Kylin. Congratulations, Temple!
>
> Let's warmly welcome Temple as the Kylin committer!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: [VOTE] Release apache-kylin-2.6.4 (RC1)

2019-10-08 Thread JiaTao Tao
+1

ShaoFeng Shi  于2019年10月8日周二 上午9:01写道:

> Hi all,
>
>
>
> I have created a build for Apache Kylin 2.6.4, release candidate 1.
>
>
>
> Changes highlights:
>
> [KYLIN-3628] - Lookup table queries always use the latest snapshot
>
> [KYLIN-3797] - Avoid out-of-memory error in Kylin server when flatting
> query filters with too many OR conditions
>
> [KYLIN-4121] - Cleanup hive view intermediate tables after job finished
>
> [KYLIN-1856] - Fix the issue that Kylin jobs show outdated output
> information when recovering from failures
>
> [KYLIN-4034] - Fix the issue that Insight page shows tables to which users
> have no access
>
> [KYLIN-4066] - Allow users who are not ROLE_ADMIN to access Planner pages
>
> [KYLIN-4131] - Fix memory leak issue within Broadcaster
>
> [KYLIN-4153] - Fix inconsistency within the transaction of metadata
> pushdown
>
> [KYLIN-4157] - Fix InternalErrorException throwing issue if
> users’PrepareStatement
> queries contain functions within WHERE clause
>
> [KYLIN-4158] - Fix wrong results caused by pushing down LIMIT condition
> when queries have expressions of columns within GROUP BY clause
>
>
>
> Thanks to everyone who has contributed to this release.
>
> Here are the release notes:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12345948
>
>
>
> The commit to being voted upon:
>
>
> https://github.com/apache/kylin/commit/7a8639f92f87b70dc712c6089d17120706fba87a
>
>
>
> its hash is 7a8639f92f87b70dc712c6089d17120706fba87a
>
>
>
> The artifacts to be voted on, including the source package and two
>
> pre-compiled binary packages are located here:
>
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.6.4-rc1/
>
>
>
> The hash of the artifacts are as follows:
>
> apache-kylin-2.6.4-bin-cdh57.tar.gz.sha256
>
> 52c4083bd459e20e5b64672eaa23d00ccaa5a03ca7ee04f0d07d7c672b8e1974
>
> apache-kylin-2.6.4-bin-cdh60.tar.gz.sha256
>
> 4b369073fb6aff2257d6c62abf1415917a46ab6ae6014889d037f4b52603dbf5
>
> apache-kylin-2.6.4-bin-hadoop3.tar.gz.sha256
>
> b3939e78bd11830b792a05b035e3027f8ea13f84178cfd718b9ec83204958f5f
>
> apache-kylin-2.6.4-bin-hbase1x.tar.gz.sha256
>
> adb763495f3ba132045842c1d2f8d99def8f6a8cf3aadff60087e8dc110ed674
>
> apache-kylin-2.6.4-source-release.zip.sha256
>
> 3faad44f24830efff6e1f799caf09db70409e9ed3d3501afcf68c96f237f978c
>
>
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachekylin-1068/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/shaofengshi.asc
>
>
>
> Please vote on releasing this package as Apache Kylin 2.6.4.
>
>
>
> The vote is open for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Kylin 2.6.4
>
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
>
> Here is my vote:
>
>
>
> +1 (binding)
>
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 

Regards!

Aron Tao


Re: [VOTE] Release apache-kylin-3.0.0-beta (RC1)

2019-09-27 Thread JiaTao Tao
+1

ShaoFeng Shi  于2019年9月26日周四 上午8:42写道:

> Hi all,
>
> I have created a build for Apache Kylin 3.0.0-beta, release candidate 1.
>
> Changes highlights:
> [KYLIN-4122] - Add Kylin user and group management modules
> [KYLIN-4167] - Refactor streaming coordinator
> [KYLIN-4114] - Provided a self-contained docker image for Kylin
> [KYLIN-4137] - Accelerate metadata reloading
>
> Thanks to everyone who has contributed to this release.
> Here’s the release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12345686
>
> The commit to being voted upon:
>
> https://github.com/apache/kylin/commit/721be80866223fecad9a6231fa2427a847bc8f48
>
> Its hash is 721be80866223fecad9a6231fa2427a847bc8f48.
>
> The artifacts to be voted on, including the source package and two
> pre-compiled binary packages, are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-3.0.0-beta-rc1/
>
> The hash of the artifacts are as follows:
> apache-kylin-3.0.0-beta-source-release.zip.sha256
> 53547e8a94eb74cdcd329777ff03f1c79209020016c2f9a62351e8c73ac8e0bd
> apache-kylin-3.0.0-beta-bin-hbase1x.tar.gz.sha256
> 1d50348660899baa9005b78cf45243e0eb2495fa0403d6250b3439ff50bf1731
> apache-kylin-3.0.0-beta-bin-cdh57.tar.gz.sha256
> bc9e303154901d4061dbac3876157cb4be25f23307f4c709d083da70aa18524b
> apache-kylin-3.0.0-beta-bin-hadoop3.tar.gz.sha256
> 681452450248f56ebe107d278e3ccb1478e42137875a2dded953db8c03488f9a
> apache-kylin-3.0.0-beta-bin-cdh60.tar.gz.sha256
> 2f66497ed39d7d78ea5a634a8796ab408586dce369edc97ed9374ba90a88b03d
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1066/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 3.0.0-beta.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 3.0.0-beta
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 

Regards!

Aron Tao


Re: 关于kylin项目如何便捷的发布到生产环境

2019-08-20 Thread JiaTao Tao
Hi

Hope this may help:
http://kylin.apache.org/docs/howto/howto_backup_metadata.html


-- 

Regards!

Aron Tao

lwjsd1...@163.com  于2019年8月21日周三 上午2:40写道:

> 您好:
> 我是kylin大数据的应用者,开发工程师一枚,大数据产品中第一次使用kylin;
>
> 目前集成kylin的程序已开发完毕。开发环境创建好的project,包含多个模型和立方体;想问一下kylin是否提供工具把项目导出一键导入到生产环境;
>
> kylin项目第一次怎么发布到生产环境,后续修改如何更新到生产环境,谢谢!
>
>
>
> lwjsd1...@163.com
>


Re: [VOTE] Release apache-kylin-3.0.0-alpha2 (RC1)

2019-07-25 Thread JiaTao Tao
+1

-- 

Regards!

Aron Tao

ShaoFeng Shi  于2019年7月25日周四 下午2:46写道:

> Hi all,
>
>
>
> I have created a build for Apache Kylin 3.0.0-alpha2, release candidate 1.
>
>
>
> Changes highlights:
>
> [KYLIN-3942] - Rea-time OLAP supports multi-level json event
>
> [KYLIN-4086] - Support connect Kylin with Tableau by JDBC
>
> [KYLIN-3841] - Build Global Dict by MR/Hive
>
> [KYLIN-4017] - Fix building engine failed to get zk lock and leads to the
> building engine doesn't work
>
> [KYLIN-3843] - List kylin instances with their server mode on web
>
> [KYLIN-3997] - Add a health check job of Kylin
>
> [KYLIN-4028] - Speed up startup progress using cached dependency
>
> [KYLIN-4035] - Calculate column cardinality by using spark engine
>
>
>
> Thanks to everyone who has contributed to this release.
>
> Here’s the release notes:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12345840
>
>
>
> The commit to being voted upon:
>
>
>
>
> https://github.com/apache/kylin/commit/06f441cd04a98a988197f7b1d750608d6ee51cd8
>
>
>
> Its hash is 06f441cd04a98a988197f7b1d750608d6ee51cd8.
>
>
>
> The artifacts to be voted on, including the source package and two
> pre-compiled
> binary packages, are located here:
>
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-3.0.0-alpha2-rc1/
>
>
>
> The hash of the artifacts are as follows:
>
> apache-kylin-3.0.0-alpha2-source-release.zip.sha256
>
> dc206ab0527703271e4cc435368a4f0980247465947a199396567dfc776bf7f8
>
> apache-kylin-3.0.0-alpha2-bin-hbase1x.tar.gz.sha256
>
> 406f2d7ea318df230d1e65600f4ffccb22accfeee47a7e2f85ee4f70bbc6bbdc
>
> apache-kylin-3.0.0-alpha2-bin-cdh57.tar.gz.sha256
>
> 1dadf9910d07ed5af233477a92013a8b6090ae18b3f8d9b48c7ee915acbafe25
>
> apache-kylin-3.0.0-alpha2-bin-hadoop3.tar.gz.sha256
>
> f0d075b0e2bc3c0953bf17c7220ddae03ae611fae03456fad215b36f9393eb28
>
> apache-kylin-3.0.0-alpha2-bin-cdh60.tar.gz.sha256
>
> 47faa1810f21fa8c4c109b82af9eb09d44c2f0d14d245a94655aaf783780a143
>
>
>
>
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachekylin-1065/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/shaofengshi.asc
>
>
>
> Please vote on releasing this package as Apache Kylin 3.0.0-alpha2.
>
>
>
> The vote is open for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Kylin 3.0.0-alpha2
>
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
>
>
>
> Here is my vote:
>
>
>
> +1 (binding)
>
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: [VOTE] Release apache-kylin-2.6.3 (RC1)

2019-07-01 Thread JiaTao Tao
+1

-- 

Regards!

Aron Tao

ShaoFeng Shi  于2019年7月1日周一 上午1:27写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.6.3, release candidate 1.
>
> Changes highlights:
> - [KYLIN-4024] - Support pushdown to Presto
> - [KYLIN-3977] - Avoid mistaken deleting dicts by storage cleanup while
> building jobs are running
> - [KYLIN-4015] – Fix build cube error at the "Build UHC Dictionary" step
> - [KYLIN-4022] - Error with message "Unrecognized column type:
> DECIMAL(xx,xx)" happens when doing query pushdown
>
> Thanks to everyone who has contributed to this release.
> Here’s release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12345582
>
> The commit to being voted upon:
>
>
> https://github.com/apache/kylin/commit/0d5f85b0a40c301134122de927204a0d17ad65fa
>
> Its hash is 0d5f85b0a40c301134122de927204a0d17ad65fa.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.6.3-rc1/
>
> The hash of the artifact is as follows:
> apache-kylin-2.6.3-source-release.zip.sha256
> 50d1cad423f1a15a5e25f1c3c68748c7ce10e0116fd67fa9e38c1470a11d389c
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1063/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.6.3.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.6.3
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: [ANNOUNCE] New Committer: Jiatao Tao

2019-06-12 Thread JiaTao Tao
Thanks everyone for your help along the way!

It is my great honor to become an Apache Kylin committer. I will work
harder to make contributions to the project.

Wish Apache Kylin better and more prosperous.


-- 


Regards!

Aron Tao



Chao Long  于2019年6月13日周四 上午3:03写道:

> Congratulations!
>
> On Thu, Jun 13, 2019 at 11:00 AM Billy Liu  wrote:
>
> > Congrats
> >
> > With Warm regards
> >
> > Billy Liu
> >
> > PENG Zhengshuai  于2019年6月13日周四 上午10:48写道:
> > >
> > > Congrats to Jiatao!
> > >
> > > Best Regards
> > > PENG Zhengshuai
> > >
> > > > On Jun 13, 2019, at 10:46 AM, ShaoFeng Shi 
> > wrote:
> > > >
> > > > The Project Management Committee (PMC) for Apache Kylin
> > > > has invited Jiatao Tao to become a committer and we are pleased
> > > > to announce that he has accepted.
> > > >
> > > > Thanks for all your hard work Jiatao; we look forward to more
> > > > contributions!
> > > >
> > > > Please join me in extending congratulations to Jiatao!
> > > >
> > > > Best regards,
> > > >
> > > > Shaofeng Shi 史少锋
> > > > Apache Kylin PMC
> > > > Email: shaofeng...@apache.org
> > > >
> > > > Apache Kylin FAQ:
> > https://kylin.apache.org/docs/gettingstarted/faq.html
> > > > Join Kylin user mail group: user-subscr...@kylin.apache.org
> > > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org
> > >
> >
>


Re: cube-rowkey排序咨询

2019-06-10 Thread JiaTao Tao
And this link(
https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin) that
Shaofeng previous shared is also very helpful, see this chapter: "The Order
of Dimensions"

-- 


Regards!

Aron Tao



Xiaoxiang Yu  于2019年6月11日周二 上午2:45写道:

> Hi, wangfx
>
> Kylin converts sql query to two parameters(Start_key and end_key) in the
> range Scan operation in HBase.
> The well-designed Rowkey will more effectively complete the query
> filtering and positioning of the data, reduce the number of IO, improve the
> query speed, the order of the dimension in the Rowkey, and have a
> significant impact on the query performance.
>
> The following 2 principles need to be combined when adjusting the order of
> Rowkey: ·
> 1. Dimensions that are used as filter criteria in a query are placed in
> front of the non-filtered conditional dimension ·
> 2. Dimensions with a higher cardinality, before the lower cardinality
> dimension.
>
> So, in your situation, I suggest the order should be :a,b,c,d.(If you have
> only four dimensions).
>
> And this link may help,
> https://kyligence.io/zh/blog/apache-kylin-optimizer-kybot-rowkey/.
>
>
> 
> Best wishes,
> Xiaoxiang Yu
>
>
> 在 2019/6/11 09:56,“wangfx”<945517...@qq.com> 写入:
>
>
> cube若干个维度,其中a,b为强制维度,一定出现在where里,b的基数很低(只有3种数据);c,d不会出现在where里,只出现在select和group
> by里,基数c>d>a>b,剩下的维度是where里的常规维度,请问rowkey里abcd和其他的维度顺序怎么排?
>
>


Re: [ANNOUNCE] Gang Ma joins the Apache Kylin PMC

2019-06-05 Thread JiaTao Tao
Congrats !


-- 

Regards!

Aron Tao

ShaoFeng Shi  于2019年6月3日周一 上午5:32写道:

> On behalf of the Apache Kylin PMC, I am pleased to announce that Gang Ma
> (马刚) has accepted our invitation to become a PMC member on the Apache Kylin
> project. We appreciate Gang stepping up to take more responsibility in the
> Kylin project.
>
> Please join me in welcoming Gang to the Kylin PMC!
>
> Best Regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: Plan to host the first "Kylin Data Summit" event

2019-05-30 Thread JiaTao Tao
Looking forward to it!


-- 


Regards!

Aron Tao

ShaoFeng Shi  于2019年5月30日周四 上午6:29写道:

> Hello Kylin developers and users,
>
>
>
> We (Kyligence Inc) planned to host the first "Kylin Data Summit" event at
> Shanghai, China. This event is going to provide a place to share, discuss
> the technology and trends in Big Data domain. The presentors and target
> audiences are big data engineers, data analysts and others who are
> interested in the data analysis domain. I’m writing to the community for
> the approval of using Apache Kylin trademark on such an event. After
> getting Kylin PMC approval, we will submit the request to the VP of ASF
> band management. The process is from
> https://www.apache.org/foundation/marks/events.html#approval
>
>
>
> The information about this event is as follows:
>
>- *What is the topic focus of the event*
>
> Big Data, OLAP, Apache Kylin and other big data technologies like Apache
> Hadoop, Apache Spark, etc.
>
>- *Who is organising the event*
>
> Kyligence Inc. and InfoQ China
>
>- *When is the event*
>
> 12th, July.
>
>- *How many attendees are expected*
>
> 500 expected attendees.
>
>- *How much PMC involvement is there already*
>
> Some Apache Kylin’s PMC members are involved in the organization of Kylin
> Data Summit, such as Billy Liu, Yang Li, Shaofeng Shi. Several PMC will
> give speech on this event: Luke, Yanghong Zhong.
>
>
>
>- *Which marks are requested*
>
> The name and logo of Apache Kylin, the Apache feather logo.
>
>- *How would you propose that the ASF will be listed as a community
>partner?*
>
> Community Partner
>
>- *How will the event selection work?*
>
> We have a selection group for this event, formed by the PMCs that will
> attend the event. We will invite the community users to prepare proposal,
> and then make a vote in the selection group
>
> *Is this for profit or non-profit?*
>
> Profit, with some free tickets to community contributors.
>
>- *The event’s related site and marketing materials*
>
> They are still under construction.
>
>
>
> Please share your comment or suggestions, thank you!
>
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: 使用left join查询时报错

2019-05-26 Thread JiaTao Tao
Hi
What's your join type in your model?

-- 


Regards!

Aron Tao



Gods_Dusk <197795...@qq.com> 于2019年5月26日周日 下午12:18写道:

> 使用下面的语句查询时报错
> select bingrenxingming,
>jiaofeibiaoji,
>yaowumingcheng,
>danjia,
>shuliang,
>jiuzhenriqi,
>kaidankeshi,
>keshimingcheng
>   from TBMENZHENFEIYONG tb
>   left join akeshidict d
> on tb.kaidankeshi = d.keshibianma
>  WHERE tb.kaidankeshi = '0963'
>and tb.kaidanyisheng = '0320'
>and tb.id = '47442018-06-08T08:52:01'
>
>
> No realization found for OLAPContext, MODEL_UNMATCHED_JOIN,
> rel#37347:OLAPTableScan.OLAP.[](table=[COST_SUM,
> TBMENZHENFEIYONG],ctx=,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
> 13,
> 14, 15]), JoinDesc [type=LEFT, primary_key=[KESHIBIANMA],
> foreign_key=[KAIDANKESHI]] while executing SQL: "select bingrenxingming,
> jiaofeibiaoji, yaowumingcheng, danjia, shuliang, jiuzhenriqi, kaidankeshi,
> keshimingcheng from TBMENZHENFEIYONG tb left join akeshidict d on
> tb.kaidankeshi = d.keshibianma WHERE tb.kaidankeshi = '0963' and
> tb.kaidanyisheng = '0320' and tb.id = '47442018-06-08T08:52:01' LIMIT
> 5"
>
> 把left join改为inner join则不会报错
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


Re: [VOTE] Release apache-kylin-2.6.2 (RC1)

2019-05-13 Thread JiaTao Tao
+1

-- 


Regards!

Aron Tao

ShaoFeng Shi  于2019年5月14日周二 上午1:10写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.6.2, release candidate 1.
>
> Changes highlights:
> [KYLIN-3892] - Set cubing job priority
> [KYLIN-3839] - Storage clean up after refreshing or deleting a segment
> [KYLIN-3873] - Fix inappropriate use of memory in SparkFactDistinct.java
> [KYLIN-3905] - Enable shrunken dictionary default
> [KYLIN-3922] - Fail to update coprocessor when run DeployCoprocessorCLI
> [KYLIN-3936] - MR/Spark task will still run after the job is stopped.
>
>
> Thanks to everyone who has contributed to this release.
> Here’s release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12345051
>
> The commit to being voted upon:
>
>
> https://github.com/apache/kylin/commit/c507ae29fa64bc7234efd6a002dcfe990969ad35
>
> Its hash is c507ae29fa64bc7234efd6a002dcfe990969ad35.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.6.2-rc1/
>
> The hash of the artifact is as follows:
> apache-kylin-2.6.2-source-release.zip.sha256
> db2ab59d3e66d635462e9c9ef49fd7ca29342f07ff4eea0730e52777287e2ebf
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1062/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.6.2.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.6.2
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: [DISCUSSION] Don't need to purge existing segment of cube to add new measures in Kylin

2019-04-19 Thread JiaTao Tao
Hi

The idea that supports Kylin adding measures dynamically is impressive.

But in my opinion, once you add a measure, the existing segments should
also calculate the new measure(just add a new measure column). Users can
have many cubes, a cube can have many segments, if measure's view is
different in each segment, it will increase the burden of the user.

-- 


Regards!

Aron Tao

yuzhang  于2019年4月20日周六 上午1:43写道:

> Hi dear kylin users and develop team:
> Here have some things I want to discuss with community.
> As a representative of MOLAP engine, kylin uses pre-aggregation strategies
> to provide high-concurrency and second-level response analysis
> capabilities, but also loses some flexibility.
> The limitation that purge existing segment firstly to add an additional
> measure will cause many double calculation and unnecessary disk IO. Such
> waste should be avoid especially in MOLAP engine.
> For example, there is an cubeA with one measure m1 and segments over time
> range1(tr1). Now, user add one measure m2, but don't want to clear segments
> over tr1. The value of m2 will exist in tr2, the segments build
> subsequently. Sure, tr1 doesn't contain value of m2, which will be
> understanded by user who know litte about MOLAP. Querying over tr1 and tr2
> is valid for both m1 and m2, but the result of m2 over tr1 will be null.
> It's will be better to reminder user the measure missing.Moreover,
> refreshing will supply the m2 to segments over tr1.
> Currently, kylin's storage engine uses HBase. The measure are aggregated
> values based on combination of various dimension members and stored in a
> column of a Column Family in HBase. For the same cube, adding a new measure
> will add a column to the HBase table(mapping) and will take effect in the
> next build. For the existing HTables(segments), the new column is allowed
> to be missing. Refreshing old existing segments will add a new column in
> their HTable to store new measure. Value of new measure is aggregated
> according to the combination of dimension members in rowkey, without
> recalculating existing measure.
> Now, For additional measure and even additional dimensions, Kylin's
> current solution is Hybrid, but we found the following shortcomings during
> use:
> 1. Management costs: Repeated maintenance of similar Cubes, most of which
> have many intersections of dimensions and indicators. If you want to
> perform optimization operations such as pruning, you need to configure all
> of these cubes.
> 2. A large number of cubes: The initial analysis of the business is not
> stable, and analysts often have the need to increase some measures. The
> cube is added continuously to the Hybrid group, which will produce a lot of
> cubes.
> 3. Repeat calculation: If you want to drop the old cube in the Hybrid
> group, you need to build the latest cube by compute historical data to
> cover the old cube.
> Those will result in a lot of waste.
> In addition, I felt that the metadata about the measure was not perfect
> during the applying of Kylin.
> 1. As one of the most important concerns of analysts, if the measures of
> the analysis system can be decoupled from the materialized view(cube) and
> have their own management system, it may be more flexibility.
> 2. Once the dimensions have been choose in cube designing, it's cuboids
> are confirmed no matter the number of measures. It may make confuse to
> maintenance cubes with different measures but same cuboids. Cubes with
> different cuboids should be considered different cube, which is the
> definition of cube, isn't it?
> It's just some thinking about MOLAP during I using kylin. How do you think
> about this? Looking forward your reply, sincerely.
> Maybe here are some mistake or misunderstanding, please feel free to
> correct me or discuss further more if you find any of them.
> Best regards
> yuzhang
>
>
> yuzhang
> shifengdefan...@163.com
>
> 
> 签名由 网易邮箱大师  定制
>


Re: 回复: 安装问题

2019-04-19 Thread JiaTao Tao
Hi
Maybe you can change the execution engine from TEZ to MR and give it a try.
Add this


hive.execution.engine
mr


to kylin_hive_conf.xml or direct change it in hive-site.xml.


-- 


Regards!

Aron Tao

mingwen@analyticservice.net 
于2019年4月19日周五 下午12:43写道:

> HI KYLIN,
> 在hdp 平台,Kylin build cubes时候出错
> INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task
> [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for
> queryId: hive_20190419142859_2332b1f4-f1d8-4c3f-87ab-79800abd3ab4 INFO :
> Tez session hasn't been created yet. Opening session ERROR : Failed to
> execute tez graph. org.apache.tez.dag.api.SessionNotRunning: TezSession has
> already shutdown. Application application_166483117_0009 failed 2 times
> due to AM Container for appattempt_166483117_0009_02 exited with
> exitCode: 1 Failing this attempt.Diagnostics: [2019-04-19
> 02:29:06.129]Exception from container-launch. Container id:
> container_e51_166483117_0009_02_01 Exit code: 1 [2019-04-19
> 02:29:06.133]Container exited with a non-zero exit code 1. Error file:
> prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr
> : [2019-04-19 02:29:06.133]Container exited with a non-zero exit code 1.
> Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096
> bytes of stderr : For more detailed output, check the application tracking
> page: http://masternode1:8088/cluster/app/application_166483117_0009
> Then click on links to logs of each attempt. . Failing the application. at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:1013)
> ~[tez-api-0.9.1.3.1.0.0-78.jar:0.9.1.3.1.0.0-78] at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:982)
> ~[tez-api-0.9.1.3.1.0.0-78.jar:0.9.1.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:536)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:451)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:124)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:373)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:372)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:199)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:210)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2711)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2382)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2054)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1752)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1746)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
> ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226)
> ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324)
> ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> ~[hadoop-common-3.1.1.3.1.0.0-78.jar:?] at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342)
> ~[hive-service-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_112] at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_112] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 

[jira] [Created] (KYLIN-3960) Only update user when login in LDAP environment

2019-04-17 Thread Jiatao Tao (JIRA)
Jiatao Tao created KYLIN-3960:
-

 Summary: Only update user when login in LDAP environment
 Key: KYLIN-3960
 URL: https://issues.apache.org/jira/browse/KYLIN-3960
 Project: Kylin
  Issue Type: Improvement
  Components: Security
Reporter: Jiatao Tao
Assignee: Jiatao Tao






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Kaisen Kang joins the Apache Kylin PMC

2019-04-16 Thread JiaTao Tao
Congratulations!

-- 


Regards!

Aron Tao

Luke Han  于2019年4月16日周二 上午5:09写道:

> On behalf of the Apache Kylin PMC I am pleased to announce that Kaisen Kang
> has accepted our invitation to become a PMC member on the Apache Kylin
> project. We appreciate Kaisen stepping up to take more responsibility in
> the Kylin project.
>
> Please join me in welcoming Kaisen to the Kylin PMC!
>
> Best Regards,
>
> Luke
>


Re: Debug kylin2.6X with CDH5.15.It didn't build cube.

2019-03-28 Thread JiaTao Tao
Hi
You can check the maven profile and will find it uses "hdp" profile, and
there exists "cdh" profile, but it still has some work to do as I used to
try. As Chao Long said, we recommend use HDP sandbox to debug.


-- 


Regards!

Aron Tao

Lio_Messi  于2019年3月28日周四 下午12:11写道:

> I want to debug kylin by runing DebugTomcat with CDH5.15.When I built a
> cube,the following error appears:
>
> 2019-03-28 19:07:25,957 INFO  [BadQueryDetector]
> service.BadQueryDetector:147 : Detect bad query.
> 2019-03-28 19:07:28,935 INFO  [FetcherRunner 1204410668-166]
> threadpool.DefaultFetcherRunner:94 : Job Fetcher: 1 should running, 1
> actual
> running, 0 stopped, 0 ready, 3826 already succeed, 6 error, 4 discarded, 0
> others
> 2019-03-28 19:07:34,140 INFO  [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] mapred.ClientServiceDelegate:276
> :
> Application state is completed. FinalApplicationStatus=FAILED. Redirecting
> to job history server
> 2019-03-28 19:07:34,163 ERROR [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] common.HadoopJobStatusChecker:58
> :
> error check status
> java.io.IOException: Job status not available
> at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
> at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:341)
> at
> org.apache.kylin.engine.mr
> .common.HadoopJobStatusChecker.checkStatus(HadoopJobStatusChecker.java:38)
> at
> org.apache.kylin.engine.mr
> .common.MapReduceExecutable.doWork(MapReduceExecutable.java:153)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:166)
> at
>
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:166)
> at
>
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-03-28 19:07:34,166 ERROR [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] common.MapReduceExecutable:198 :
> error execute
> MapReduceExecutable{id=e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-01,
> name=Extract
> Fact Table Distinct Columns, state=RUNNING}
> java.lang.NullPointerException
> at org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:380)
> at
> org.apache.kylin.engine.mr
> .common.HadoopCmdOutput.getInfo(HadoopCmdOutput.java:66)
> at
> org.apache.kylin.engine.mr
> .common.MapReduceExecutable.doWork(MapReduceExecutable.java:163)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:166)
> at
>
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:166)
> at
>
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-03-28 19:07:34,176 INFO  [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] execution.ExecutableManager:453 :
> job id:e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-01 from RUNNING to ERROR
> 2019-03-28 19:07:34,178 ERROR [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] execution.AbstractExecutable:168
> :
> error running Executable:
> CubingJob{id=e3e62e0e-daf8-e0bb-762d-81bcbddc55ae,
> name=BUILD CUBE - first_cube - 2015010100_2015060100 - GMT+08:00
> 2019-03-27 19:35:43, state=RUNNING}
> 2019-03-28 19:07:34,181 DEBUG [pool-6-thread-1] cachesync.Broadcaster:116 :
> Servers in the cluster: [localhost:7070]
> 2019-03-28 19:07:34,181 DEBUG [pool-6-thread-1] cachesync.Broadcaster:126 :
> Announcing new broadcast to all: BroadcastEvent{entity=execute_output,
> event=update, cacheKey=e3e62e0e-daf8-e0bb-762d-81bcbddc55ae}
> 2019-03-28 19:07:34,185 DEBUG [http-bio-7070-exec-8]
> cachesync.Broadcaster:246 : Broadcasting UPDATE, execute_output,
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae
> 2019-03-28 19:07:34,185 DEBUG [pool-6-thread-1] cachesync.Broadcaster:116 :
> Servers in the cluster: [localhost:7070]
> 2019-03-28 19:07:34,185 INFO  [Scheduler 154250424 Job
> e3e62e0e-daf8-e0bb-762d-81bcbddc55ae-202] execution.ExecutableManager:453 :
> job id:e3e62e0e-daf8-e0bb-762d-81bcbddc55ae from RUNNING to ERROR
> 2019-03-28 19:07:34,185 DEBUG [pool-6-thread-1] cachesync.Broadcaster:126 :
> Announcing new broadcast to all: BroadcastEvent{entity=execute_output,

Re: [DISCUSS] Kylin 3.0 alpha and beta release before GA

2019-03-26 Thread JiaTao Tao
Seems awesome, looking forward to Kylin 3.0.


-- 


Regards!

Aron Tao

ShaoFeng Shi  于2019年3月25日周一 上午1:24写道:

> Hello,
>
> About two months ago, we raised the "[Discuss] Moving toward Apache Kylin
> 3.0" in the developer group, all agree to use 3.0 as the next major release
> version when the Real-Time feature released. Now we're merging the code
> from the RT feature branch into the master branch.
>
> Although this feature has been in production in certain early users, it has
> not been widely evaluated by the community. I would like to propose
> releasing the alpha and beta before the GA release, just like what we did
> in Kylin v2.0. This is to give our users enough time to evaluate; On the
> other side, it gives the developers the time to hear feedback, to improve
> the stability/performance, catch up the documentation and others.
>
> A rough plan is:
> - April, 3.0 alpha release
> - June, 3.0 beta release
> - July to Aug, 3.0 GA release
>
> Before 3.0 GA, the v2.6 branch will roll out bug fix releases at a steady
> pace; Usually, 1 version every 1-2 months, depends on the severity of the
> reported issues.
>
> We warmly welcome the community users to join the 3.0 alpha and beta.
> Please share your comments here. Thank you for the support to Apache Kylin!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: kylin top-n query

2019-03-18 Thread JiaTao Tao
And this may also help:
http://kylin.apache.org/docs/tutorial/create_cube.html (go to the "TOP_N"
Section)


-- 


Regards!

Aron Tao

黄云尧  于2019年3月18日周一 下午12:06写道:

> someone has  documents for   top-n query in kylin ?
>
>
>
>


Re: [Discussion] Enable shrunken dictionary by default

2019-03-17 Thread JiaTao Tao
+1, seems improved a lot.


-- 


Regards!

Aron Tao

Xiaoxiang Yu  于2019年3月18日周一 上午2:27写道:

> Dear all,
> I suggest enable "kylin.dictionary.shrunken-from-global-enabled" by
> default(it is disabled by default), because I found enable it will speed up
> cube build process when cube have count distinct(bitmap) on a large
> cardinality column. This feature is contributed in KYLIN-3491.
>
> When using count distinct(bitmap) measure on a large cardinality
> column(this require global dictionary), build base cuboid step need
> frequent cache swap so it cannot finished within a reasonable period.
> KYLIN-3491 add a new step to build separated dictionary for each InputSplit
> before BuildBaseCuboid step. So mapper of BuildBaseCuboid step only has to
> fetch a smaller dictionary for itself(without unused value), instead of a
> larger global dictionary. It will reduce cache swap and make
> BuildBaseCuboid step run as quick as possible.
>
> In my test env, my hadoop cluster is a CDH cluster with 56 vcore and 110GB
> Memory. I create a model with a fact table (153326740 rows) and three
> dimension tables, there are three count distinct(bitmap) measure which the
> largest cardinality of single column is 55200325. With ShrunkenDict
> disabled, the BuildBaseCuboid cannot completed in 22 hours. Comparatively,
> with ShrunkenDict enabled, build process completed in a reasonable
> duration(Extra Dictionary cost 5 minutes, Build Base Cuboid costs 5
> minutes).
>
>
> https://user-images.githubusercontent.com/14030549/54363305-ad25e200-46a5-11e9-8bc7-fe2c385c0278.png
>
> If you want know more, please check
> https://issues.apache.org/jira/browse/KYLIN-3491. If you have any
> suggestion, please let me know.
>
> 
> Best wishes,
> Xiaoxiang Yu
>
>


Re: Cube build failure at Step 2

2019-03-13 Thread JiaTao Tao
You are welcome.

nithya.mb4...@gmail.com  于2019年3月13日周三 上午9:19写道:

> Thank you.
> For now I have changed tez-site.xml in tez-client config and it is working
> fine. I will consider this option if it fails again.
>
> Thanks,
> Nithya
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


-- 


Regards!

Aron Tao


Re: Cube build failure at Step 2

2019-03-13 Thread JiaTao Tao
Hi
I recommend you use MR instead of Tez, you can add this to your
kylin_hive_conf.xml in KYLIN_HOME/conf or direct modify you hive-site.xml

> 
>
> hive.execution.engine
>
> mr
>
> 
>
>

-- 


Regards!

Aron Tao

nithya.mb4...@gmail.com  于2019年3月12日周二 下午1:02写道:

> Hello,
> My cube build is failing at Step 2 with below error:
>
> TaskAttempt 1 failed, info=[Error: Failure while running
> task:java.lang.IllegalArgumentException: tez.runtime.io.sort.mb 1892 should
> be larger than 0 and should be less than the available task memory
> (MB):1577
>
> I have attached full log:  Step_2_Error_Log.txt
> 
>
> Please help.
>
> Kylin Version: 2.6.0
>
> Regards,
> Nithya
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


Re: [Discuss] Won't ship Spark binary in Kylin binary anymore

2019-03-09 Thread JiaTao Tao
+1


-- 


Regards!

Aron Tao

ShaoFeng Shi  于2019年3月8日周五 上午2:43写道:

> Hello,
>
> As we know Kylin ships a Spark in its binary package; The total package
> becomes bigger and bigger as the version grows; the latest version (v2.6.1)
> is bigger than 350MB, which was rejected by Apache SVN server when trying
> to upload the new package. Among the 350MB, more than 200MB is Spark, while
> Spark is not mandatory for Kylin.
>
> So I would propose to exclude Spark from Kylin's binary package, from the
> current v2.6.1; the user just needs to point SPARK_HOME to any a folder of
> the expected spark version, or manually download and then put it to
> KYLIN_HOME/spark.  All other behaviors are not impacted.
>
> Just share your comments if any.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>


Re: [VOTE] Release apache-kylin-2.6.1 (RC1)

2019-03-04 Thread JiaTao Tao
+1

-- 


Regards!

Aron Tao



ShaoFeng Shi  于2019年3月4日周一 上午10:35写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.6.1, release candidate 1.
>
> Changes highlights:
> [KYLIN-3494] - Build cube with spark reports ArrayIndexOutOfBoundsException
> [KYLIN-3537] - Use Spark to build Cube on Yarn failed at Setp8 on HDP3.
> [KYLIN-3815] - Unexpected behavior when joining the streaming table and
> hive table
> [KYLIN-3828] - ArrayIndexOutOfBoundsException thrown when building a
> streaming cube with empty data in its first dimension
> [KYLIN-3833] - Potential OOM in Spark Extract Fact Table Distinct Columns
> step
> [KYLIN-3826] - MergeCuboidJob only uploads necessary segment's dictionary
>
> Thanks to everyone who has contributed to this release.
> Here’s the release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12344845
>
> The commit to being voted upon:
>
>
> https://github.com/apache/kylin/commit/270cfe68ecc94c66141b29e2ccf20b9ec25e23dd
>
> Its hash is 270cfe68ecc94c66141b29e2ccf20b9ec25e23dd.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.6.1-rc1/
>
> The hash of the artifact is as follows:
> apache-kylin-2.6.1-source-release.zip.sha256
> 961b8c8d0e781fe7936efb7f33cebb9661b4fbf83082669769a41b47cea19001
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1060/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.6.1.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.6.1
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


[jira] [Created] (KYLIN-3834) Add monitor for curator-based scheduler

2019-02-26 Thread Jiatao Tao (JIRA)
Jiatao Tao created KYLIN-3834:
-

 Summary: Add monitor for curator-based scheduler
 Key: KYLIN-3834
 URL: https://issues.apache.org/jira/browse/KYLIN-3834
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Jiatao Tao
Assignee: Jiatao Tao
 Fix For: v3.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: can not open kylin web ui

2019-02-22 Thread JiaTao Tao
Hi

You can check files in "${KYLIN_HOME)/logs" and see if there's something
unexpected occurs.

-- 


Regards!

Aron Tao

hetadesai56  于2019年2月22日周五 下午2:34写道:

> Hi,
>
> I am working on HDP 2.6.5 on virtual box. I have installed
> apache-kylin-2.6.0-bin. kylin started successfully. But Web UI is not
> working. I did port forwarding in virtual box for kylin default port 7070.
>
> How can i resolve this ?
>
> Thank you,
> Heta
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


Re: development environment

2019-02-21 Thread JiaTao Tao
Hi
Please check Ambari and see whether the HIVE is healthy or not. If not, you
can check the HIVE's logs(/var/log/hive).



-- 


Regards!

Aron Tao



XiaoHui Zhang <18125...@bjtu.edu.cn> 于2019年2月22日周五 上午2:56写道:

>   Hi, dear team,I am a beginner of Kylin and I am building kylin
> development environment with HDP Sandbox.But when I am running
> $KYLIN_HOME/bin/kylin.sh start,it occurs the following errors:
>
> [root@sandbox-hdp apache-ktlin-2.6.0-bin-hadoop3]#./bin/kylin.sh start
> Retrieving hadoop conf dir...
> KYLIN_HOME is set to /usr/local/apache-kylin-2.6.0-bin-hadoop3
> Retrieving hive dependency...
> Something wrong with Hive CLI or Beline,please execute Hive CLI or Beeline
> CLI in termina
>
>
>
>
>  Did I need to make any changes to kylin's configuration file?if not,why
> did this happen?
>
>
>  Hope for any of yours reply.
>
>
>
>


Re: Build cube exception: java.io.FileNotFoundException

2019-02-20 Thread JiaTao Tao
As a beginner, I recommend you using the integrated sandbox to deploy
Kylin, such as HDP sandbox (
http://hortonworks.com/products/hortonworks-sandbox) or CDH.


-- 


Regards!

Aron Tao


jiangxiaoma111 <369806...@qq.com> 于2019年2月19日周二 上午10:02写道:

> hi, all.
> I am a beginner of kylin. My deployment env:
> os: max os 10.12.6
> hadoop: 3.0.0
> hive: 2.3.1
> hbase: 1.2.9
> kylin: apache-kylin-2.6.0-bin-hbase1x
>
> All project is deployed standlone.
>
> When I build a cube,there is a error occured in Step 3:
> 2019-02-19 17:22:33,757 ERROR [pool-10-thread-3]
> threadpool.DefaultScheduler:116 : ExecuteException
> job:f2fdcdd5-7c2b-df7c-89ea-8a61a85f8975
> org.apache.kylin.job.exception.ExecuteException:
> org.apache.kylin.job.exception.ExecuteException:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://localhost:8020/usr/local/var/log/hadoop/hadoop-didi/mapred/staging/didi2113494170/.staging/job_local2113494170_0001/libjars/hive-hcatalog-core-2.3.1.jar
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180)
> at
>
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kylin.job.exception.ExecuteException:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://localhost:8020/usr/local/var/log/hadoop/hadoop-didi/mapred/staging/didi2113494170/.staging/job_local2113494170_0001/libjars/hive-hcatalog-core-2.3.1.jar
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180)
> at
>
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:70)
> at
>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:165)
> ... 4 more
> Caused by: java.io.FileNotFoundException: File does not exist:
>
> hdfs://localhost:8020/usr/local/var/log/hadoop/hadoop-didi/mapred/staging/didi2113494170/.staging/job_local2113494170_0001/libjars/hive-hcatalog-core-2.3.1.jar
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
> at
>
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> I searched a lot of  posts, but can not solute the exception.
> Please help.
>
>
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


Re: [New Document] Kylin SQL reference

2019-01-30 Thread JiaTao Tao
Very helpful! Thanks to Na Zhai.


ShaoFeng Shi  于2019年1月30日周三 下午3:13写道:

> Hello Kylin users,
>
> A new document is added to Apache Kylin website for introducing the SQL
> grammar, functions and data types that Kylin supports; We believe it will
> help new users. Many thanks to Na Zhai, who drafted this doc and verified
> the sample queries.
>
> Here is the link:
>
> English:
> https://kylin.apache.org/docs/tutorial/sql_reference.html
>
> Chinese:
> https://kylin.apache.org/cn/docs/tutorial/sql_reference.html
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>

-- 


Regards!

Aron Tao


Re: Kylin+Spark = NoClassDefFoundError

2019-01-30 Thread JiaTao Tao
If you want to use your own Spark, try a binary package with Hadoop.



Kamil  于2019年1月30日周三 下午11:40写道:

> Hello All,
>
> I'm new Kylin user. I successfully managed to get everything work with
> "Sample Cube" (http://kylin.apache.org/docs/tutorial/kylin_sample.html)
>
> Now I wanted to make it work with Spark
> (http://kylin.apache.org/docs/tutorial/cube_spark.html) but I'm
> struggling with one problem:
>
> When I run "build", I got this exception:
> kylin | Exception in thread "main"
> java.lang.NoClassDefFoundError: org/slf4j/Logger
> kylin | at
> java.lang.Class.getDeclaredMethods0(Native Method)
> kylin | at
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> kylin | at
> java.lang.Class.privateGetMethodRecursive(Class.java:3048)
> kylin | at
> java.lang.Class.getMethod0(Class.java:3018)
> kylin | at
> java.lang.Class.getMethod(Class.java:1784)
> kylin | at
> sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
> kylin | at
> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> kylin | Caused by: java.lang.ClassNotFoundException:
> org.slf4j.Logger
> kylin | at
> java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> kylin | at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> kylin | at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> kylin | at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> kylin | ... 7 more
> kylin | The command is:
> kylin | export HADOOP_CONF_DIR=/etc/hadoop &&
> /opt/spark-2.3.2-bin-without-hadoop/bin/spark-submit --class
> org.apache.kylin.common.util.SparkEnt
> ry  --conf spark.executor.instances=40  --conf
> spark.network.timeout=600  --conf spark.yarn.queue=default  --conf
> spark.history.fs.logDirectory=hdfs://namenode:
> 8020/kylin/spark-history  --conf
> spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
> --conf spark.dynamicAllocation.enabled=true  --conf spar
> k.master=yarn  --conf spark.dynamicAllocation.executorIdleTimeout=300
> --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf
> spark.executor.memory=4G
>   --conf spark.eventLog.enabled=true  --conf
> spark.eventLog.dir=hdfs://namenode:8020/kylin/spark-history --conf
> spark.dynamicAllocation.minExecutors=1  --conf s
> park.executor.cores=1  --conf
> spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf
> spark.yarn.executor.memoryOverhead=1024  --conf spark.hadoop.
> dfs.replication=2  --conf spark.dynamicAllocation.maxExecutors=1000
> --conf
>
> spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.c
> ompress.DefaultCodec  --conf spark.driver.memory=2G  --conf
> spark.submit.deployMode=cluster  --conf
> spark.shuffle.service.enabled=true --jars /opt/apache-kylin-
> 2.6.0-bin/lib/kylin-job-2.6.0.jar
> /opt/apache-kylin-2.6.0-bin/lib/kylin-job-2.6.0.jar -className
> org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput
> hdfs://namenode:8020/kylin/kylin_metadata/kylin-50b3e245-c00f-0136-1ec8-d5c5c472a311/kylin_sales_cube/counter
>
> -statisticssamplingpercent 100 -cubename kylin_sal
> es_cube -hiveTable
> default.kylin_intermediate_kylin_sales_cube_a2c3dfb4_900c_f8eb_5086_8bbee7e5c60a
>
> -output hdfs://namenode:8020/kylin/kylin_metadata/kylin-50b3
> e245-c00f-0136-1ec8-d5c5c472a311/kylin_sales_cube/fact_distinct_columns
> -input
>
> hdfs://namenode:8020/kylin/kylin_metadata/kylin-50b3e245-c00f-0136-1ec8-d5c5c472a
> 311/kylin_intermediate_kylin_sales_cube_a2c3dfb4_900c_f8eb_5086_8bbee7e5c60a
>
> -segmentId a2c3dfb4-900c-f8eb-5086-8bbee7e5c60a -metaUrl
> kylin_metadata@hdfs,path=h
>
> dfs://namenode:8020/kylin/kylin_metadata/kylin-50b3e245-c00f-0136-1ec8-d5c5c472a311/kylin_sales_cube/metadata
> kylin | at
>
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:96)
> kylin | at
>
> org.apache.kylin.engine.spark.SparkExecutable$2.call(SparkExecutable.java:281)
> kylin | at
>
> org.apache.kylin.engine.spark.SparkExecutable$2.call(SparkExecutable.java:276)
> kylin | at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> kylin | at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> kylin | at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> kylin | at java.lang.Thread.run(Thread.java:748)
> kylin | 2019-01-29 11:44:18,469 INFO  [Scheduler
> 1207320921 Job 50b3e245-c00f-0136-1ec8-d5c5c472a311-118]
> 

Re: 查询无结果

2019-01-29 Thread JiaTao Tao
Hi,
Kylin is not for "select *" style query. See
http://kylin.apache.org/docs/gettingstarted/faq.html (Why I got an error
when running a “select * “ query?)

Try
"
select
count(*)
from HUOBI_GLOBAL.HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY
where HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY.EXCHANGE_NAME = 'b11'
"
and see if there's any unexpected. It's more reasonable.


廉立伟  于2019年1月30日周三 上午1:49写道:

>
> 你好
> 我这样查询有结果, EXCHANGE_NAME为b11
> select
> EXCHANGE_NAME
> from HUOBI_GLOBAL.HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY
> group by  EXCHANGE_NAME
> 但是我这样查询结果为0,这是为啥呢
> select
> EXCHANGE_NAME
> from HUOBI_GLOBAL.HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY
> where HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY.EXCHANGE_NAME = 'b11'
>
> 在 2019/1/30 上午9:45,“JiaTao Tao” 写入:
>
> Hi,
> Can not see your pic, can you describe your problem?
>
>
>
>
> 廉立伟  于2019年1月29日周二 下午2:26写道:
>
> >
> >
> > 你好  加where查询没有结果  为啥
> >
> > M *lianli...@huobi.com *
> >
> >
> >
>
>
> --
>
>
> Regards!
>
> Aron Tao
>
>
>

-- 


Regards!

Aron Tao


Re: 查询无结果

2019-01-29 Thread JiaTao Tao
Hi,
Can not see your pic, can you describe your problem?




廉立伟  于2019年1月29日周二 下午2:26写道:

>
>
> 你好  加where查询没有结果  为啥
>
> M *lianli...@huobi.com *
>
>
>


-- 


Regards!

Aron Tao


Re: Cleaning up hdfs working directory

2019-01-24 Thread JiaTao Tao
Hi
Take a look at this:
http://kylin.apache.org/docs/howto/howto_cleanup_storage.html

kdcool6932  于2019年1月24日周四 上午8:04写道:

> Hi Kylin,We are having a a lot of data in our hdfs working directory,
> around 10tb , for last one year or so, this is acutally more than the hbase
> usage of kylin(around 9TB) on one of our kylin cluster. We are using kylin
> 2.3.1 on this cluster.1. Are all these files required for Kylin
> functionality ??2. Is there a way to clean them up(kylin clean up job is
> not helping here), and keep only required data on hdfs??3. Also does this
> storage needs to be on hdfs only, or can we point it to some non dfs
> storage, like local FS or s3 bucket ?
> This might help us in reducing our hdfs storage and using it more
> judiciously.
> Thanks,ke...@exponential.com
>
>
> Sent from my Samsung Galaxy smartphone.



-- 


Regards!

Aron Tao


Re: [Discuss] Moving toward Apache Kylin 3.0

2019-01-23 Thread JiaTao Tao
+1


-- 


Regards!

Aron Tao



ShaoFeng Shi  于2019年1月23日周三 上午7:57写道:

> Hi Kylin developers,
>
> In last week, Kylin released v2.6.0, with the enhanced & distributed query
> cache and JDBC data source SDK. After this release, the next batch
> candidate features include real-time streaming, parquet storage, and druid
> storage. These features were developed in the past 1-2 years by different
> Kylin players and were open sourced in the past 6 months. They have already
> been staged in separate branches and are under evaluation by the community.
> We have received much feedback from the community.
>
> These candidate features are big supplements to as-is Kylin functions; For
> example, the real-time streaming feature will bring Kylin from batch &
> historical analytics into real-time analytics. The parquet storage will
> make the deployment more flexible and more cloud-friendly. Of course,
> stabilizing and improving these features need additional time and effort.
>
> So, when we merging and releasing them, we'd better give it a new version
> number so that user can clearly know the difference with current 2.x
> versions. I discussed this with several developers offline, we think it is
> time to move toward Kylin 3.0. So, if one of the above features is merged,
> the version will be 3.0. The current 2.6 will be maintained until 3.x is
> ready for production use.
>
> Your comments, ideas, and suggestions are welcomed!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


Re: 生成的cube部分数据缺失

2019-01-22 Thread JiaTao Tao
Hi
DId you set "filter" when modeling? (
http://kylin.apache.org/docs/tutorial/create_cube.html)
And check the time range when you built the cube. Make sure it is a full
build.

[image: image.png]

奥威软件 <3513797...@qq.com> 于2019年1月22日周二 上午7:25写道:

> Hi
> here are results:
>
>
> kylin:
> select count(*) from ICSTOCKBILL_1W
> result:10366
>
>
> hive:
> select count(*) from ICSTOCKBILL_1W
> result:10411
>
>
>
>
> -- 原始邮件 --
> 发件人: "JiaTao Tao";
> 发送时间: 2019年1月22日(星期二) 下午3:04
> 收件人: "dev";
>
> 主题: Re: 生成的cube部分数据缺失
>
>
>
> Hi
> Can you try "select count(*)" and compare the result with hive?
>
> FYI: http://kylin.apache.org/docs/gettingstarted/faq.html (Why I got an
> error when running a “select * “ query?)
>
>
> 奥威软件 <3513797...@qq.com> 于2019年1月22日周二 上午5:21写道:
>
> > 没有group by 也一样能查到数据的
> > 例如把 goodsid 改为1137,
> > select * from ICSTOCKBILL_1W where goodsid = '1137'
> > 结果:
> >
> >
> > 现在的问题是cube的数据有部分缺失了,查不到数据,
> > 造成汇总数据错误
> >
> >
> > hive表里的数据是完整的,
> > 生成的cube缺丢失了部分数据
> >
> >
> > -- 原始邮件 --
> > *发件人:* "Chao Long";
> > *发送时间:* 2019年1月22日(星期二) 中午12:11
> > *收件人:* "dev";
> > *主题:* 回复:生成的cube部分数据缺失
> >
> > Hi,
> >   The cube only has aggregated data, so your queries should include
> "group
> > by" clause.
> >   You can check the faq:
> > http://kylin.apache.org/docs/gettingstarted/faq.html(# Why I got an
> error
> > when running a “select * “ query)
> >
> >
> > --
> > Best Regards,
> > Chao Long
> >
> >
> > -- 原始邮件 --
> > 发件人: "奥威软件"<3513797...@qq.com>;
> > 发送时间: 2019年1月22日(星期二) 中午11:06
> > 收件人: "dev";
> >
> > 主题: 生成的cube部分数据缺失
> >
> >
> >
> > Hi,
> >
> >
> > 已确认,在hive表里,数据完整,但同样的查询语句查询事实表,hive有数据且完整,但查询cube 就没有数据,
> > 查询语句如 select * from ICSTOCKBILL_1W where goodsid = '643',
> > ICSTOCKBILL_1W 为事实表,
> > 通过查询语句
> > hive表有数据
> > 查询cube没有数据
> >
> >
> > 经常检查有个位数的goodsid 缺失了,且找不到规律和疑点。
> >
> >
> > env:
> > ubuntu 16.04 hadoop集群 3台
> > apache-kylin-2.5.2-bin-hadoop3  or apache-kylin-2.6.0-bin-hadoop3 (kylin
> > 单机)
> > hive:2.3.4
> > hbase:1.3.3
> > zookeeper:3.4.13
> >
> >
> >
> > 请帮忙看下怎么解决,谢谢!
> >
> >
> > Best regards
> >
>
>
> --
>
>
> Regards!
>
> Aron Tao



-- 


Regards!

Aron Tao


Re: 生成的cube部分数据缺失

2019-01-21 Thread JiaTao Tao
Hi
Can you try "select count(*)" and compare the result with hive?

FYI: http://kylin.apache.org/docs/gettingstarted/faq.html (Why I got an
error when running a “select * “ query?)


奥威软件 <3513797...@qq.com> 于2019年1月22日周二 上午5:21写道:

> 没有group by 也一样能查到数据的
> 例如把 goodsid 改为1137,
> select * from ICSTOCKBILL_1W where goodsid = '1137'
> 结果:
>
>
> 现在的问题是cube的数据有部分缺失了,查不到数据,
> 造成汇总数据错误
>
>
> hive表里的数据是完整的,
> 生成的cube缺丢失了部分数据
>
>
> -- 原始邮件 --
> *发件人:* "Chao Long";
> *发送时间:* 2019年1月22日(星期二) 中午12:11
> *收件人:* "dev";
> *主题:* 回复:生成的cube部分数据缺失
>
> Hi,
>   The cube only has aggregated data, so your queries should include "group
> by" clause.
>   You can check the faq:
> http://kylin.apache.org/docs/gettingstarted/faq.html(# Why I got an error
> when running a “select * “ query)
>
>
> --
> Best Regards,
> Chao Long
>
>
> -- 原始邮件 --
> 发件人: "奥威软件"<3513797...@qq.com>;
> 发送时间: 2019年1月22日(星期二) 中午11:06
> 收件人: "dev";
>
> 主题: 生成的cube部分数据缺失
>
>
>
> Hi,
>
>
> 已确认,在hive表里,数据完整,但同样的查询语句查询事实表,hive有数据且完整,但查询cube 就没有数据,
> 查询语句如 select * from ICSTOCKBILL_1W where goodsid = '643',
> ICSTOCKBILL_1W 为事实表,
> 通过查询语句
> hive表有数据
> 查询cube没有数据
>
>
> 经常检查有个位数的goodsid 缺失了,且找不到规律和疑点。
>
>
> env:
> ubuntu 16.04 hadoop集群 3台
> apache-kylin-2.5.2-bin-hadoop3  or apache-kylin-2.6.0-bin-hadoop3 (kylin
> 单机)
> hive:2.3.4
> hbase:1.3.3
> zookeeper:3.4.13
>
>
>
> 请帮忙看下怎么解决,谢谢!
>
>
> Best regards
>


-- 


Regards!

Aron Tao


Re: An incorrect result when I used kylin join

2019-01-18 Thread JiaTao Tao
Hi
Kylin' cube only has aggregated data, so try some aggregations in SQL like
min/max etc.

FYI: http://kylin.apache.org/docs/gettingstarted/faq.html (Why I got an
error when running a “select * “ query?)

雒智  于2019年1月18日周五 上午1:43写道:

>
> hello :
>  My  dimension table   has  a  column   named   pub_time  whose  type
> is  timestamp,Execute   the  SQL  below   will   get  a  incorrect  result.
>
>
> SELECT
> PUB_TIME
> FROM MANKE_DW.BIBI_SEASON_WATCH_FACT as BIBI_SEASON_WATCH_FACT
> INNER JOIN   BIBI_SEASON_DIM
> ON BIBI_SEASON_WATCH_FACT.SEASON_SK = BIBI_SEASON_DIM.SEASON_SK
> where date_sk=304  order  by  pub_time desc;
>
>
> the  desc  result  is
>
>
>
> the  asc  result  is
>
>
> I  must  change  the  sql  for   correct  result:
>
> SELECT
> PUB_TIME
> FROM MANKE_DW.BIBI_SEASON_WATCH_FACT as BIBI_SEASON_WATCH_FACT
> INNER JOIN (select season_sk,badge,copyright,isfinish,title,cast(pub_time
> as  date) pub_time from  MANKE_DW.BIBI_SEASON_DIM) as BIBI_SEASON_DIM
> ON BIBI_SEASON_WATCH_FACT.SEASON_SK = BIBI_SEASON_DIM.SEASON_SK
> where date_sk=304  order  by  pub_time desc;
>
> the desc   reslut is
>
>
> the  asc  result  is
>
>
> Note:  the   incorrect  result  onlyappear  injoin  query。 Is is
>  a  bug  ?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

-- 


Regards!

Aron Tao


Re: [VOTE] Release apache-kylin-2.6.0 (RC1)

2019-01-09 Thread JiaTao Tao
+1
mvn test passed

Yanghong Zhong  于2019年1月9日周三 上午2:46写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.6.0, release candidate 1.
>
> Changes highlights:
> [KYLIN-2895] - Refine query cache by changing the query cache expiration
> strategy by signature checking and introducing memcached as distributed
> cache
> [KYLIN-2932] - Simplify the thread model for in-memory cubing
> [KYLIN-3021] - Check MapReduce job failed reason and include the
> diagnostics into email notification
> [KYLIN-3272] - Upgrade Spark dependency to 2.3.2
> [KYLIN-3540] - Improve Mandatory Cuboid Recommendation Algorithm
> [KYLIN-3552] - Data Source SDK to ingest data from different JDBC sources
> [KYLIN-3611] - Upgrade Tomcat to 7.0.91, 8.5.34 or later
> [KYLIN-3656] - Improve HLLCounter performance
> [KYLIN-3700] - Quote sql identities when creating flat table
> [KYLIN-3729] - CLUSTER BY CAST(field AS STRING) will accelerate base cuboid
> build with UHC global dict
>
> Thanks to everyone who has contributed to this release.
> Here’s release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12344003
>
> The commit to be voted upon:
>
>
> https://github.com/apache/kylin/commit/8737bc1f555a2789a67462c8f8420b6ab3be97ce
>
> Its hash is 8737bc1f555a2789a67462c8f8420b6ab3be97ce.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.6.0-rc1/
>
> The hash of the artifact is as follows:
> apache-kylin-2.6.0-source-release.zip.sha256
>
> 3621750945823ff4f0c4124b6d5b5c7164d9b08686729352ea22b2f486958d2a
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1059/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/nju_yaho.asc
>
> Please vote on releasing this package as Apache Kylin 2.6.0.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.6.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
> Yanghong Zhong
> eBay Inc.
>


-- 


Regards!

Aron Tao


Re: Increment Upload in Kylin

2019-01-08 Thread JiaTao Tao
Seems like "incremental build"? Cube data consists of segments and
every building is a new segment and will not refresh the old segs.

somu0...@gmail.com  于2019年1月7日周一 上午2:16写道:

> Is there any feature in kylin which will do increment update without
> refreshing the complete cube.  for example if one dimension get new data
> every day it should calculate the new one without refreshing the complete
> cube which will save time for building the cube. Could you please help me
> if
> such feature available in kylin?
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


-- 


Regards!

Aron Tao


Re: 答复: 请问可以设置多台机器同时构建cube吗?

2019-01-08 Thread JiaTao Tao
Kylin will submit cubing tasks on Yarn, if your Hadoop cluster has
multi-nodes, it can use their abilities

NoOne <3513797...@qq.com> 于2019年1月8日周二 上午8:04写道:

> sorry,我问的是可以设置多台机器同时构建同一个cube吗?
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


-- 


Regards!

Aron Tao


Re: help kylin

2019-01-04 Thread JiaTao Tao
Hi

I cannot see your pic, can you post the pic again, or describe the problem?

王建圆  于2019年1月4日周五 上午10:04写道:

> hello,I  import kylin project used IDEA accord to
> http://kylin.apache.org/cn/development/dev_env.html,I encountered some
> errors when I started.
> [image: image.png]
> Can you give me some advice?thanks
>


-- 


Regards!

Aron Tao


Re: 答复: show the kylin sql for timeout

2018-12-26 Thread JiaTao Tao
At present, Kylin's min "kylin.query.timeout-seconds" is 60s, and you can
not set this smaller.

If you want to simulate the scenario of timeout, you can take a look at
"ITKylinQueryTest#testTimeoutQuery". It uses a hack way, see:
"org.apache.kylin.gridtable.StorageSideBehavior#SCAN_FILTER_AGGR_CHECKMEM_WITHDELAY".
But I do not recommend this cuz it seems just for developing and it is too
hacky.

By the way, Kylin's timeout exception is: KylinTimeoutException. And
there's a test mocked this scenario:
"org.apache.kylin.rest.service.KylinQueryTimeoutTest". May this can help
you.

-- 


Regards!

Aron Tao

黄云尧  于2018年12月26日周三 上午11:54写道:

> i want to catch the exception and do something
> 发件人:Na Zhai 
> 发送日期:2018-12-26 16:40:32
> 收件人:"dev@kylin.apache.org" 
> 主题:答复: show the kylin sql for timeout>Hi, yunyao.
> >
> >   Is that you mean you want to see the phenomenon of SQL query
> timeout? Why do you want to do that? Or you just want to know the value of
> timeout?
> >
> >
> >
> >发送自 Windows 10 版邮件应用
> >
> >
> >
> >
> >发件人: 黄云尧 
> >发送时间: Monday, December 24, 2018 4:52:05 PM
> >收件人: dev@kylin.apache.org
> >主题: show the kylin sql for timeout
> >
> >I want to demo the appearance when a sql for timeout in kylin query ,
> someone has a good idea?
> >
> >
> >
> >
> >
> >
> >
>
>
>


Re: kylin sql query timeout

2018-12-20 Thread JiaTao Tao
HI, you can take a look
at org.apache.kylin.common.exceptions.KylinTimeoutException.

黄云尧  于2018年12月20日周四 上午6:39写道:

> I want to know excepion  class when a sql query was timeout ,  someone
> knows?
>
>
>
>
>
>

-- 


Regards!

Aron Tao


Re: Re: use single quote in sql ,how to escape

2018-12-19 Thread JiaTao Tao
You are welcome!

黄云尧  于2018年12月19日周三 上午8:33写道:

> thanks 。you are right。 Single quotes are escaped by doubling them up.
> 发件人:JiaTao Tao 
> 发送日期:2018-12-19 16:28:48
> 收件人:dev@kylin.apache.org
> 主题:Re: use single quote in sql ,how to escape>Hi
> >In SQL, Single quotes are escaped by doubling them up. Try this: select *
> >from buzz_info where title like '%hello i'' am kangkan%'
> >
> >By the way, Kylin is not suitable for answer "select *", see this:
> >Why I got an error when running a “select * “ query? (
> >http://kylin.apache.org/docs/gettingstarted/faq.html)
> >
> >黄云尧  于2018年12月19日周三 上午8:15写道:
> >
> >> the sql : select * from buzz_info where title like '%hello
> i'
> >> am kangkan%'
> >>
> >>
> >> but it is a worry grammar,how to escape by use single quote.
> >>
> >>
> >> I am looking forward to your reply
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >--
> >
> >
> >Regards!
> >
> >Aron Tao
>
>
>

-- 


Regards!

Aron Tao


Re: Re: Evaluate Kylin on Parquet

2018-12-19 Thread JiaTao Tao
Hi Gang

In my opinion, segments/partition pruning is actually in the scope of
"Index system", we can have an "Index system" in storage level including
File index(for segment/partition pruning), page index(for page pruning)
etc. We can put all these stuff in such a system and make the separation of
duties cleaner.


Ma Gang  于2018年12月19日周三 上午6:31写道:

> Awesome! Looking forward to the improvement. For dictionary, keep the
> dictionary in query engine, most time is not good since it brings lots of
> pressure to Kylin server, but sometimes it has benefit, for example, some
> segments can be pruned very early when filter value is not in the
> dictionary, and some queries can be answer directly using dictionary as
> described in: https://issues.apache.org/jira/browse/KYLIN-3490
>
> At 2018-12-17 15:36:01, "ShaoFeng Shi"  wrote:
>
> The dimension dictionary is a legacy design for HBase storage I think;
> because HBase has no data type, everything is a byte array, this makes
> Kylin has to encode STRING and other types with some encoding method like
> the dictionary.
>
> Now with the storage like Parquet, it would decide how to encode the data
> at the page or block level. Then we can drop the dictionary after the cube
> is built. This will release the memory pressure of Kylin query nodes and
> also benefit the UHC case.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> Chao Long  于2018年12月17日周一 下午1:23写道:
>
>>  In this PoC, we verified Kylin On Parquet is viable, but the query
>> performance still have room to improve. We can improve it from the
>> following aspects:
>>
>>  1, Minimize result set serialization time
>>  Since Kylin need Object[] data to process, we convert Dataset to RDD,
>> and then convert the "Row" type to Object[], so Spark need to serialize
>> Object[] before return it to driver. Those time need to be avoided.
>>
>>  2, Query without dictionary
>>  In this PoC, for less storage use, we keep dict encode value in Parquet
>> file for dict-encode dimensions, so Kylin must load dictionary to convert
>> dict value for query. If we keep original value for dict-encode dimension,
>> dictionary is unnecessary. And we don't hava to worry about the storage
>> use, because Parquet will encode it. We should remove dictionary from query.
>>
>>  3, Remove query single-point issue
>>  In this PoC, we use Spark to read and process Cube data, which is
>> distributed, but kylin alse need to process result data the Spark returned
>> in single jvm. We can try to make it distributed too.
>>
>>  4, Upgrade Parquet to 1.11 for page index
>>  In this PoC, Parquet don't have page index, we get a poor filter
>> performance. We need to upgrade Parquet to version 1.11 which has page
>> index to improve filter performance.
>>
>> --
>> Best Regards,
>> Chao Long
>>
>> -- 原始邮件 --
>> *发件人:* "ShaoFeng Shi";
>> *发送时间:* 2018年12月14日(星期五) 下午4:39
>> *收件人:* "dev";"user";
>> *主题:* Evaluate Kylin on Parquet
>>
>> Hello Kylin users,
>>
>> The first version of Kylin on Parquet [1] feature has been staged in
>> Kylin code repository for public review and evaluation. You can check out
>> the "kylin-on-parquet" branch [2] to read the code, and also can make a
>> binary build to run an example. When creating a cube, you can select
>> "Parquet" as the storage in the "Advanced setting" page. Both MapReduce and
>> Spark engines support this new storage. A tech blog is under drafting for
>> the design and implementation.
>>
>> Thanks so much to the engineers' hard work: Chao Long and Yichen Zhou!
>>
>> This is not the final version; there is room to improve in many aspects,
>> parquet, spark, and Kylin. It can be used for PoC at this moment. Your
>> comments are welcomed. Let's improve it together.
>>
>> [1] https://issues.apache.org/jira/browse/KYLIN-3621
>> [2] https://github.com/apache/kylin/tree/kylin-on-parquet
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Work email: shaofeng@kyligence.io
>> Kyligence Inc: https://kyligence.io/
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>
>
>


-- 


Regards!

Aron Tao


Re: use single quote in sql ,how to escape

2018-12-19 Thread JiaTao Tao
Hi
In SQL, Single quotes are escaped by doubling them up. Try this: select *
from buzz_info where title like '%hello i'' am kangkan%'

By the way, Kylin is not suitable for answer "select *", see this:
Why I got an error when running a “select * “ query? (
http://kylin.apache.org/docs/gettingstarted/faq.html)

黄云尧  于2018年12月19日周三 上午8:15写道:

> the sql : select * from buzz_info where title like '%hello i'
> am kangkan%'
>
>
> but it is a worry grammar,how to escape by use single quote.
>
>
> I am looking forward to your reply
>
>
>
>
>
>
>

-- 


Regards!

Aron Tao


Re: [DISCUSS] Stop inserting git diffs to JIRA ticket

2018-12-02 Thread JiaTao Tao
+1

ShaoFeng Shi  于2018年12月3日周一 上午1:46写道:

> Hello Kylin developers,
>
> After we enable the git box for Kylin code repository, when there is a PR
> merged, the "ASF Github Bot" will insert the git diff to the associated
> JIRA. We noticed this function will make the JIRA very long when the code
> change is big. Besides, when cherry-picking the change to another branch,
> it will append again. This makes it is too hard for a human to read the
> JIRA, the important message may be overlooked.
>
> A typical sample is this:
> https://issues.apache.org/jira/browse/KYLIN-3187
>
> My proposal is, stopping sync the code change from GitHub to JIRA; Only
> keep necessary notifications like "A PR is created/closed" etc. For the
> code change, people should go to GitHub code history, not JIRA.
>
> Please express your ideas; If no objection in the next couple of days, we
> will raise a change request to the infrastructure team.
>
> Thanks for your input!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 


Regards!

Aron Tao


Re: [VOTE] Release apache-kylin-2.5.2 (RC2)

2018-11-30 Thread JiaTao Tao
+1

mvn test passed


ShaoFeng Shi  于2018年11月30日周五 下午1:57写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.5.2, release candidate 2.
>
> Changes:
> [KYLIN-3187] - JDK APIs using the default locale, time zone or character
> set should be avoided
> [KYLIN-3636] - Wrong "storage_type" in CubeDesc causing cube building error
> [KYLIN-3666] - Mege cube step 2: Update dictionary throws
> IllegalStateException
> [KYLIN-3672] - Performance is poor when multiple queries occur in a short
> period
> [KYLIN-3676] - Update to custom calcite and remove the "atopcalcite"
> [KYLIN-3678] - CacheStateChecker may remove a cache file that under a
> building
> [KYLIN-3683] - Package org.apache.commons.lang3 not exists
> [KYLIN-3689] - When the startTime is equal to the endTime in build request,
> the segment will build all data.
> [KYLIN-3693] - TopN, Count distinct incorrect in Spark engine
> [KYLIN-3705] - Segment Pruner mis-functions when the source data has
> Chinese characters
> Thanks to everyone who has contributed to this release.
>
> Here are release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12344466
>
> The commit to being voted upon:
>
>
> https://github.com/apache/kylin/commit/0e519d859e217fbfadd534313376e532d2c647fa
>
> Its hash is 0e519d859e217fbfadd534313376e532d2c647fa.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.2-rc2/
>
> The hashe of the artifact is as follows:
> apache-kylin-2.5.2-source-release.zip.sha256
> fca5688cf64442ea595e07c2a4a4b2b549836d268ce8f10f3d559f05c22b61d0
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1058/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.5.2.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PPMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.5.2
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Work email: shaofeng@kyligence.io
> Kyligence Inc: https://kyligence.io/
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 


Regards!

Aron Tao


Re: [VOTE] Release apache-kylin-2.5.2 (RC1)

2018-11-27 Thread JiaTao Tao
+1

Yichen Zhou  于2018年11月27日周二 上午11:48写道:

> mvn  test passed
> +1
>
> -Yichen
>
> Chao Long  于2018年11月27日周二 下午6:37写道:
>
> > +1
> > mvn test pass
> >
> >
> > --
> > -
> > Chao Long
> >
> >
> >
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "ShaoFeng Shi";
> > 发送时间: 2018年11月27日(星期二) 晚上6:32
> > 收件人: "dev";
> >
> > 主题: [VOTE] Release apache-kylin-2.5.2 (RC1)
> >
> >
> >
> > Hi all,
> >
> > I have created a build for Apache Kylin 2.5.2, release candidate 1.
> >
> > Changes:
> > [KYLIN-3636] - Wrong "storage_type" in CubeDesc causing cube building
> error
> > [KYLIN-3666] - Mege cube step 2: Update dictionary throws
> > IllegalStateException
> > [KYLIN-3672] - Performance is poor when multiple queries occur in a short
> > period
> > [KYLIN-3676] - Update to custom calcite and remove the "atopcalcite"
> > [KYLIN-3678] - CacheStateChecker may remove a cache file that under a
> > building
> > [KYLIN-3683] - Package org.apache.commons.lang3 not exists
> > [KYLIN-3689] - When the startTime is equal to the endTime in build
> request,
> > the segment will build all data.
> > [KYLIN-3693] - TopN, Count distinct incorrect in Spark engine
> >
> > Thanks to everyone who has contributed to this release.
> >
> > Here are release notes:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12344466
> >
> > The commit to being voted upon:
> >
> >
> >
> https://github.com/apache/kylin/commit/481933a35fffb44f3e7c529ad24754afadae3f47
> >
> > Its hash is 481933a35fffb44f3e7c529ad24754afadae3f47.
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.2-rc1/
> >
> > The hashe of the artifact is as follows:
> > apache-kylin-2.5.2-source-release.zip.sha256
> > 7577b3353a1663b51ba3d927e3fe6762a8752825e675e7aba1a28ac861e90007
> >
> > A staged Maven repository is available for review at:
> > https://repository.apache.org/content/repositories/orgapachekylin-1057/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/shaofengshi.asc
> >
> > Please vote on releasing this package as Apache Kylin 2.5.2.
> >
> > The vote is open for the next 72 hours and passes if a majority of
> > at least three +1 PPMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Kylin 2.5.2
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> > Apache Kylin PMC
> > Work email: shaofeng@kyligence.io
> > Kyligence Inc: https://kyligence.io/
> >
> > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> > Join Kylin user mail group: user-subscr...@kylin.apache.org
> > Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>


-- 


Regards!

Aron Tao


Re: [Announce] Welcome new Apache Kylin committer: ChunEn Ni (倪春恩)

2018-11-27 Thread JiaTao Tao
Congratulations!

ShaoFeng Shi  于2018年11月27日周二 上午7:59写道:

> The Project Management Committee (PMC) for Apache Kylin
> has invited ChunEn Ni(倪春恩) to become a committer and we are pleased
> to announce that he has accepted.
>
> Congratulations and welcome, ChunEn!
>
> Shaofeng Shi
>
> On behalf of the Apache Kylin PMC
>


-- 


Regards!

Aron Tao


Re: dont complete a job in apache kylin.

2018-11-19 Thread JiaTao Tao
It seems that you are doing POC with your own PC, here's a link for you:
http://kylin.apache.org/docs/install/index.html



> We recommend you to try out Kylin or develop it using the integrated
> sandbox, such as HDP sandbox, and make sure it has at least 10 GB of
> memory. When configuring a sandbox, we recommend that you use the Bridged
> Adapter model instead of the NAT model.


HDP sandbox  is very
recommended for your scenario.


ebrahim zare  于2018年11月18日周日 下午3:55写道:

> hi. I built a job and see it in the monitor of Kylin.
> the status is running.
> I see a new table in Hive and all tables in
> yarn(hadoop:http://localhost:8088/cluster/apps/ACCEPTED) but dont full
> progress (final status=UNDEFINED).
> I check the kylin.log but i dont find Error.
>
> ---
> <
> http://apache-kylin.74782.x6.nabble.com/file/t799/Screenshot_from_2018-11-18_07-48-55.png>
>
> <
> http://apache-kylin.74782.x6.nabble.com/file/t799/Screenshot_from_2018-11-18_07-49-20.png>
>
> <
> http://apache-kylin.74782.x6.nabble.com/file/t799/Screenshot_from_2018-11-18_07-51-32.png>
>
> ..
>  thank you.
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


-- 


Regards!

Aron Tao


Re: why does not the job complete?

2018-11-16 Thread JiaTao Tao
Hi
"Create Intermediate Flat Hive Table" will submit a job on YARN, and you
can check this job to see where it sucks.

ebrahim zare  于2018年11月16日周五 下午6:42写道:

> hi.
> I could install Apache Kylin and built a job but doesnt complete it after
> 100 minutes and wait in first step (Create Intermediate Flat Hive Table)
> (status= running).
> why does not the job complete?
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>


-- 


Regards!

Aron Tao


Re: Kylin Cluster Mode Issue Overwriting conflict /user/ADMIN

2018-11-10 Thread JiaTao Tao
You are welcome, enjoy Kylin :).

Shrikant Bang  于2018年11月11日周日 上午2:35写道:

> Thanks JiaTao for response. I have upgraded Kylin to v2.5.1-hbase1.x and
> issue resolved.
>
> Regards,
> Shrikant Bang.
>
> On Sat, Nov 10, 2018 at 7:22 AM JiaTao Tao  wrote:
>
>> Seems the same with this JIRA:
>> https://issues.apache.org/jira/browse/KYLIN-3562.
>>
>> Shrikant Bang  于2018年11月10日周六 上午2:28写道:
>>
>> > Hi Team,
>> >
>> > I have 3 node Kylin (v2.5.0-hbase1.x) Cluster (1all+2query). I am
>> > seeing HTTP response codes for QUERY REST APIs giving error code 500. I
>> see
>> > these exception when I try with concurrent requests.
>> >
>> > Did anyone faced this issue? Am I missing any configuration in
>> cluster
>> > mode?
>> >
>> >Here is exception trace on tomcat logs.
>> >
>> > Nov 10, 2018 2:20:08 AM org.apache.catalina.core.StandardWrapperValve
>> > invoke
>> > SEVERE: Servlet.service() for servlet [kylin] in context with path
>> > [/kylin] threw exception
>> > *org.apache.kylin.common.persistence.WriteConflictException: Overwriting
>> > conflict /user/ADMIN, expect old TS 1541787608396, but it is
>> 1541787608402*
>> > at
>> >
>> org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:325)
>> > at
>> >
>> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:323)
>> > at
>> >
>> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:308)
>> > at
>> >
>> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:287)
>> > at
>> >
>> org.apache.kylin.metadata.cachesync.CachedCrudAssist.save(CachedCrudAssist.java:192)
>> > at
>> >
>> org.apache.kylin.rest.security.KylinUserManager.update(KylinUserManager.java:122)
>> > at
>> >
>> org.apache.kylin.rest.service.KylinUserService.updateUser(KylinUserService.java:85)
>> > at
>> >
>> org.apache.kylin.rest.security.KylinAuthenticationProvider.authenticate(KylinAuthenticationProvider.java:117)
>> > at
>> >
>> org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
>> > at
>> >
>> org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:199)
>> > at
>> >
>> org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:180)
>> > at
>> >
>> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
>> > at
>> >
>> org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:200)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
>> > at
>> >
>> org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
>> > at
>> >
>> org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)
>> > at
>> >
>> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
>> > at
>> >
>> org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
>> > at
>> >
>> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
>> > at
>> >
>> org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
>> > at
>> >
>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:33

Re: Kylin Cluster Mode Issue Overwriting conflict /user/ADMIN

2018-11-09 Thread JiaTao Tao
Seems the same with this JIRA:
https://issues.apache.org/jira/browse/KYLIN-3562.

Shrikant Bang  于2018年11月10日周六 上午2:28写道:

> Hi Team,
>
> I have 3 node Kylin (v2.5.0-hbase1.x) Cluster (1all+2query). I am
> seeing HTTP response codes for QUERY REST APIs giving error code 500. I see
> these exception when I try with concurrent requests.
>
> Did anyone faced this issue? Am I missing any configuration in cluster
> mode?
>
>Here is exception trace on tomcat logs.
>
> Nov 10, 2018 2:20:08 AM org.apache.catalina.core.StandardWrapperValve
> invoke
> SEVERE: Servlet.service() for servlet [kylin] in context with path
> [/kylin] threw exception
> *org.apache.kylin.common.persistence.WriteConflictException: Overwriting
> conflict /user/ADMIN, expect old TS 1541787608396, but it is 1541787608402*
> at
> org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:325)
> at
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:323)
> at
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:308)
> at
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:287)
> at
> org.apache.kylin.metadata.cachesync.CachedCrudAssist.save(CachedCrudAssist.java:192)
> at
> org.apache.kylin.rest.security.KylinUserManager.update(KylinUserManager.java:122)
> at
> org.apache.kylin.rest.service.KylinUserService.updateUser(KylinUserService.java:85)
> at
> org.apache.kylin.rest.security.KylinAuthenticationProvider.authenticate(KylinAuthenticationProvider.java:117)
> at
> org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
> at
> org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:199)
> at
> org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:180)
> at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:200)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)
> at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
> at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
> at
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
> at
> org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214)
> at
> org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)
> at
> org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
> at
> org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:262)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> at com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:209)
> at com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:244)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
> at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:494)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
> at
> 

Re: There was no measure column in the fact table after build cube

2018-11-08 Thread JiaTao Tao
Hi Scott Fan,

1. Kylin only stores aggregated values in cubes, you can try to query
sum(PRICE) and see the results.
2. It's as expected, "COUNT aggregation" means count(*), it does not need a
column.

This link may be helpful:
http://kylin.apache.org/docs/tutorial/create_cube.html

Scott Fan  于2018年11月9日周五 上午5:48写道:

> Hello team,
>
> There are two issues troubled me.
>
> 1. When I build a cube with a measure column as SUM aggregation, e.g. a
> PRICE,
> the PRICE column is null in the fact table after it done.
>
> 2. When I build a cube with a COUNT aggregation, there was no related
> column in the fact table after it done.
>
>
> Is there any problems when I built the cube or are there any key points of
> Kylin's concept that I missed?
>
> Thank you very much.
>
>

-- 


Regards!

Aron Tao


Re: doubt about measure of processedRowCount

2018-11-06 Thread JiaTao Tao
Very glad that my reply is helpful, I already opened a JIRA to add logs for
"*GTStreamAggregateScanner*" and next time it would be much easier to
navigate this :).

cheney <531014...@qq.com> 于2018年11月6日周二 下午11:57写道:

> Hi, JiaTao, thank you very much!  The statis is right when I config 
> "kylin.query.stream-aggregate-enabled=false".
> You are right. Records are pre-aggregated by GTStreamAggregateScanner.
>
>
> ------ 原始邮件 --
> *发件人:* "JiaTao Tao";
> *发送时间:* 2018年11月6日(星期二) 晚上10:50
> *收件人:* "user";
> *主题:* Re: doubt about measure of processedRowCount
>
> One possible place I can find in the code is using
> *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*).
> You can find it does do aggregate in
> *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll
> reduce the inputs. But there's no log printing in this class as you can
> see, so it's pretty hard to confirm. Try
> "kylin.query.stream-aggregate-enabled=false" and run the scenario again to
> see any differences.
>
> cheney <531014...@qq.com> 于2018年11月5日周一 下午6:55写道:
>
>> Yes. the log is as following.
>>
>> 2018-11-02 22:25:34,980 DEBUG [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
>> gtrecord.StorageResponseGTScatter:88 : Using
>> SortMergedPartitionResultIterator to merge 103 partition results
>> 2018-11-02 22:25:34,982 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
>> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to
>> merge segment results*
>> 2018-11-02 22:25:34,982 DEBUG [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122
>> : return TupleIterator...
>> 2018-11-02 22:25:34,991 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : 
>> *Processed
>> rows for each storageContext*: 366
>> 2018-11-02 22:25:34,991 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 :
>> Stats of SQL response: isException: false, duration: 20, *total scan
>> count 1552*
>>
>> Acoording the log,  *valueA *= 366. *valueB*= (total scan count) 1552 -
>> (total Agrrated/filterd in hbase)270 = 1282
>>  *valueB *is much larger than *valueA *.
>>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "JiaTao Tao";
>> *发送时间:* 2018年11月5日(星期一) 下午2:41
>> *收件人:* "user";
>> *主题:* Re: doubt about measure of processedRowCount
>>
>> Can you grep logs like "to merge segment results" in that scenario?
>>
>> cheney <531014...@qq.com> 于2018年11月3日周六 下午4:15写道:
>>
>>> Thank your repling, .but I  am sure there's only one OlapContext in the
>>> quey in my scenario.
>>> ---Original---
>>> *From:* "JiaTao Tao"
>>> *Date:* Sat, Nov 3, 2018 10:42 AM
>>> *To:* "user";
>>> *Subject:* Re: doubt about measure of processedRowCount
>>>
>>> Maybe count all the *valueA *would be more appropriate, cuz maybe
>>> there's more than one OlapContext in the query ( one OlapContext correspond
>>> one storageContext ).
>>>
>>> There are two good blogs about Kylin's query engine, you may take a look
>>> :).
>>>
>>> https://blog.csdn.net/yu616568/article/details/50838504
>>>
>>> https://zhuanlan.zhihu.com/p/30613434
>>>
>>> cheney <531014...@qq.com> 于2018年11月2日周五 下午11:10写道:
>>>
>>>> Hi, guys
>>>>
>>>> When I executed a sql in kylin, kylin server will log some log
>>>> about query statics. for example, The log is as following:
>>>>
>>>>"Processed rows for each storageContext: *valueA*". *valueA *is 
>>>> processedRowCount.
>>>>
>>>>What I understand is processedRowCount is the record rows
>>>> numbers returned by hbase.
>>>>
>>>>Hbase corprocessor will log region stats, including:  "*Total
>>>> scanned row*","Total filtered/aggred row".
>>>>
>>>> For  one region,  final records returned by hbase = *Total scanned
>>>> row - *Total filtered/aggred row;
>>>>Suppose this query need to scan 10 region in hbase, we can get
>>>> every region stats. we can get all records  *valueB *returned by hbase
>>>> by
>>>>suming every final records in 10 region.
>>>>
>>>>   In general, *valueA *is equal to * valueB*, but *valueB *is much
>>>> larger than *valueA* in sometimes. Why?
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards!
>>>
>>> Aron Tao
>>>
>>
>>
>> --
>>
>>
>> Regards!
>>
>> Aron Tao
>>
>
>
> --
>
>
> Regards!
>
> Aron Tao
>


-- 


Regards!

Aron Tao


Re: WELCOME to dev@kylin.apache.org

2018-11-06 Thread JiaTao Tao
I have the impression that Kylin may return a wrong answer for the raw
query(if you do not have raw measure) but not throw an exception.

>From your pic, it seems like a POC, have you ever tried Kylin's tutorial
about sample cube ( http://kylin.apache.org/docs/tutorial/kylin_sample.html)?
Maybe it's a good beginning.

Support DrakosData  于2018年11月6日周二 下午4:22写道:

> This answer is in Kylin Faq: "Why I got an error when running a “select *
> “ query?"
>
> [http://kylin.apache.org/docs/gettingstarted/faq.html ]
>
>
> On 6/11/18 2:30, 1311217...@qq.com wrote:
>
> my kylin under dev environment happen a null exception when select.develop
> tool is ideal,system is windows 7.
>
>
> --
> 1311217...@qq.com
>
>
> *From:* dev-help 
> *Date:* 2018-11-06 09:15
> *To:* 1311217283 <1311217...@qq.com>
> *Subject:* WELCOME to dev@kylin.apache.org
> Hi! This is the ezmlm program. I'm managing the
> dev@kylin.apache.org mailing list.
>
> I'm working for my owner, who can be reached
> at dev-ow...@kylin.apache.org.
>
> Acknowledgment: I have added the address
>
>1311217...@qq.com
>
> to the dev mailing list.
>
> Welcome to dev@kylin.apache.org!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the dev list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
> 
>
> To remove your address from the list, send a message to:
> 
>
> Send mail to the following for info and FAQ for this list:
> 
> 
>
> Similar addresses exist for the digest list:
>
> 
>
> 
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
> 
>
> To get an index with subject and author for messages 123-456 , mail:
>
> 
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
> 
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain" , just add a hyphen and
> your
> address (with '=' instead of '@') after the command word:
> 
> 
>
> To stop subscription for this address, mail:
> 
> 
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> dev-ow...@kylin.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: <1311217...@qq.com> <1311217...@qq.com>
> Received: (qmail 68465 invoked by uid 99); 6 Nov 2018 01:15:07 -
> Received: from pnap-us-west-generic-nat.apache.org (HELO
> spamd4-us-west.apache.org) (209.188.14.142)
> by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2018 01:15:07
> +
> Received: from localhost (localhost [127.0.0.1])
> by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org)
> with ESMTP id 6856EC066E
> for
> 
> ;
> Tue,  6 Nov 2018 01:15:07 + (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: 3.315
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.315 tagged_above=-999 required=6.31
> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, INVALID_MSGID=1.167,
> RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001,
> T_DKIMWL_WL_MED=-0.01, T_MIME_MALF=0.01] autolearn=disabled
> Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
> dkim=pass (1024-bit key) header.d=qq.com
> Received: from mx1-lw-us.apache.org ([10.40.0.8])
> by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port
> 10024)
> with ESMTP id aevuZEg1re_M
> for
> 
> 
> ;
> Tue,  6 Nov 2018 01:15:04 + (UTC)
> Received: from smtpproxy19.qq.com (smtpproxy19.qq.com [184.105.206.84])
> by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with
> ESMTPS id D43945F20B
> for
> 
> ;
> Tue,  6 Nov 2018 01:15:03 + (UTC)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512;
> t=1541466895; bh=hs/zcDFZ2389Qif7NrJn/ELwlTHVlmIwMESXVtphGRQ=;
> h=Date:From:To:Subject:References:Mime-Version:Message-ID:Content-Type;
> b=BPYWO8te3+/ImjeVwa83SWpmfDGS67Abgnt3uwyEodEK5Ad4K4prRjKFp8VBpQOMt
> csaIW2lj1NkI/RhjAFqdlnnG1jwfImqCvmE6iAEYuKeCTQvZSVb5DQdZcPRrxp4UFw
> svxWRFY18FZDi8u2UgP9oosIUMuOI2k2Nq7ji0f4=
> X-QQ-mid: esmtp5t1541466894tc96ub4xo
> Received: from kindy-PC 

[jira] [Created] (KYLIN-3669) Add logs to GTStreamAggregateScanner

2018-11-06 Thread Jiatao Tao (JIRA)
Jiatao Tao created KYLIN-3669:
-

 Summary: Add logs to GTStreamAggregateScanner
 Key: KYLIN-3669
 URL: https://issues.apache.org/jira/browse/KYLIN-3669
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Reporter: Jiatao Tao
Assignee: Jiatao Tao
 Attachments: image-2018-11-06-23-00-57-775.png

There's no log printing in GTStreamAggregateScanner, and you cannot know that 
you go to that path, it's pretty confusing.
 !image-2018-11-06-23-00-57-775.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-02 Thread JiaTao Tao

Here is my vote:

+1 (binding)

ShaoFeng Shi  于2018年11月2日周五 下午2:10写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.5.1, release candidate 1.
>
> Changes highlights:
>
> [KYLIN-3531] - Login failed with case-insensitive username
> [KYLIN-3604] - Can't build cube with spark in HBase standalone mode
> [KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
> cluster namespace at "Create HTable" step
> [KYLIN-3634] - When the filter column has null value may cause incorrect
> query result
> [KYLIN-3635] - Percentile calculation on Spark engine is wrong
> [KYLIN-3644] - NumberFormatExcetion on null values when building cube with
> Spark
> [KYLIN-3599] - Bulk Add Measures
>
> Thanks to everyone who has contributed to this release.
> Here’s release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12344108
>
> The commit to be voted upon:
>
>
> https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016
>
> Its hash is 24e2452309a450ec4ef62339b003343eabe23016.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/
>
> The hashe of the artifact is as follows:
> apache-kylin-2.5.1-source-release.zip.sha256
> 21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1056/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.5.1.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PPMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.5.1
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


-- 


Regards!

Aron Tao


Re: 有关kylin在Hbase中生成的表的处理疑问

2018-10-31 Thread JiaTao Tao
This in FAQ may also help:
http://kylin.apache.org/docs/gettingstarted/faq.html

What kind of data be left in ‘kylin.env.hdfs-working-dir’ ? We often
> execute kylin cleanup storage command, but now our working dir folder is
> about 300 GB size, can we delete old data manually?
>
>- The data in ‘hdfs-working-dir’ (‘hdfs:///kylin/kylin_metadata/’
>by default) includes intermediate files (will be GC) and Cuboid data (won’t
>be GC). The Cuboid data is kept for the further segments’ merge, as Kylin
>couldn’t merge from HBase. If you’re sure those segments won’t be merged,
>you can move them to other paths or even delete.
>-  Please pay attention to the “resources” sub-folder under
>‘hdfs-working-dir’, which persists some big metadata files like
>dictionaries and lookup tables’ snapshots. They shouldn’t be moved.
>
>
杨冠军  于2018年11月1日周四 上午9:10写道:

>
> 您好:
> kylin构建完成后,会在hbase中产生很多文件,附件中为生成的文件截图,请问这些文件哪些可以删除?
>谢谢!
>
> *杨冠军* 开发工程师
> * champion.y...@kutesmart.com  *
> *公司名称:青岛酷特智能股份有限公司*
> 地址:即墨市红领大街17号
> 电话:
> 手机:15563907071
>


-- 


Regards!

Aron Tao


Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-26 Thread JiaTao Tao
You are welcome, ShaoFeng! Storage and query engine are inseparable and
should design together for fully gaining each other's abilities. And I'm
very excited about the new coming columnar storage and query engine!


-- 


Regards!

Aron Tao


ShaoFeng Shi  于2018年10月26日周五 下午10:28写道:

> Exactly; Thank you jiatao for the comments!
>
> JiaTao Tao  于2018年10月25日周四 下午6:12写道:
>
> > As far as I'm concerned, using Parquet as Kylin's storage format is
> pretty
> > appropriate. From the aspect of integrating Spark, Spark made a lot of
> > optimizations for Parquet, e.g. We can enjoy Spark's vectorized reading
> and
> > lazy dict decoding, etc.
> >
> >
> > And here are my thoughts about integrating Spark and our query engine. As
> > Shaofeng mentioned, a cuboid is a Parquet file, and you can think of this
> > as a small table and we can read this cuboid as a DataFrame directly,
> which
> > can be directly queried by Spark, a bit like this:
> >
> >
> ss.read.parquet("path/to/CuboidFile").filter("xxx").agg("xxx").select("xxx").
> > (We need to implement some Kylin's advanced aggregations, as for some
> > Kylin's basic aggregations like sum/min/max, we can use Spark's directly)
> >
> >
> >
> > *Compare to our old query engine, the advantages are as follows:*
> >
> >
> >
> > 1. It is distributed! Our old query engine will get all data into a query
> > node and then calculate, it's a single point of failure and often leads
> OOM
> > when in a huge amount of data.
> >
> >
> >
> > 2. It is simple and easy to debug(every step is very clear and
> > transparent), you can collect data after every single phase,
> > e.g.(filter/aggregation/projection, etc.), so you can easily check out
> > which operation/phase went wrong. Our old query engine uses Calcite for
> > post-calculation, it's difficult when pinpointing problems, especially
> when
> > relating to code generation, and you cannot insert your own logic during
> > computation.
> >
> >
> >
> > 3. We can fully enjoy all efforts that Spark made for optimizing
> > performance, e.g. Catalyst/Tungsten, etc.
> >
> >
> >
> > 4. It is easy for unit tests, you can test every step separately, which
> > could reduce the testing granularity of Kylin's query engine.
> >
> >
> >
> > 5. Thanks to Spark's DataSource API, we can change Parquet to other data
> > formats easily.
> >
> >
> >
> > 6. A lot of upstream tools for Spark like many machine learning tools can
> > directly be integrated with us.
> >
> >
> >
> > ==
> >
> >
> ==
> >
> >  Hi Kylin developers.
> >
> >
> >
> > HBase has been Kylin’s storage engine since the first day; Kylin on
> > HBase
> >
> > has been verified as a success which can support low latency & high
> >
> > concurrency queries on a very large data scale. Thanks to HBase, most
> > Kylin
> >
> > users can get on average less than 1-second query response.
> >
> >
> >
> > But we also see some limitations when putting Cubes into HBase; I
> > shared
> >
> > some of them in the HBaseConf Asia 2018[1] this August. The typical
> >
> > limitations include:
> >
> >
> >
> >- Rowkey is the primary index, no secondary index so far;
> >
> >
> >
> > Filtering by row key’s prefix and suffix can get very different
> > performance
> >
> > result. So the user needs to do a good design about the row key;
> > otherwise,
> >
> > the query would be slow. This is difficult sometimes because the user
> > might
> >
> > not predict the filtering patterns ahead of cube design.
> >
> >
> >
> >- HBase is a key-value instead of a columnar storage
> >
> >
> >
> > Kylin combines multiple measures (columns) into fewer column families
> > for
> >
> > smaller data size (row key size is remarkable). This causes HBase
> often
> >
> > needing to read more data than requested.
> >
> >
> >
> >- HBase couldn't run on YARN
> >
> >
> >
> > This makes the deployment and auto-scaling a little complicated,
> > especially
> >
> > in the cloud.
> >
> >
> >
> >   

Re: Unable to connect to Kylin Web UI

2018-10-25 Thread JiaTao Tao
If you suspect this problem is related to port, you may change your Kylin's
default port(7070) to any other available ports temporarily just
for clarifying your suspicion first?


The way to modify this is changing " 于2018年10月26日周五 上午9:41写道:

> Here is log.
>
> kylin.start 
>
> kylin.log 
> kylin.out 
>
> The kylin seems start and there is no error in the log.
> A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
> Check the log at /root/apache-kylin-2.5.0-bin-hbase1x/logs/kylin.log
> Web UI is at http://:7070/kylin
> [root@sandbox-hdp bin]# netstat -tunlp |grep 7070
> tcp0  0 0.0.0.0:70700.0.0.0:*
>  LISTEN
> 5160/java
>
> The reason may be the sandbox hdp(2.6.5) port forwarding.Do you know how to
> enable 7070 port for sandbox hdp(2.6.5) in sandbox.
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-25 Thread JiaTao Tao
As far as I'm concerned, using Parquet as Kylin's storage format is pretty
appropriate. From the aspect of integrating Spark, Spark made a lot of
optimizations for Parquet, e.g. We can enjoy Spark's vectorized reading and
lazy dict decoding, etc.


And here are my thoughts about integrating Spark and our query engine. As
Shaofeng mentioned, a cuboid is a Parquet file, and you can think of this
as a small table and we can read this cuboid as a DataFrame directly, which
can be directly queried by Spark, a bit like this:
ss.read.parquet("path/to/CuboidFile").filter("xxx").agg("xxx").select("xxx").
(We need to implement some Kylin's advanced aggregations, as for some
Kylin's basic aggregations like sum/min/max, we can use Spark's directly)



*Compare to our old query engine, the advantages are as follows:*



1. It is distributed! Our old query engine will get all data into a query
node and then calculate, it's a single point of failure and often leads OOM
when in a huge amount of data.



2. It is simple and easy to debug(every step is very clear and
transparent), you can collect data after every single phase,
e.g.(filter/aggregation/projection, etc.), so you can easily check out
which operation/phase went wrong. Our old query engine uses Calcite for
post-calculation, it's difficult when pinpointing problems, especially when
relating to code generation, and you cannot insert your own logic during
computation.



3. We can fully enjoy all efforts that Spark made for optimizing
performance, e.g. Catalyst/Tungsten, etc.



4. It is easy for unit tests, you can test every step separately, which
could reduce the testing granularity of Kylin's query engine.



5. Thanks to Spark's DataSource API, we can change Parquet to other data
formats easily.



6. A lot of upstream tools for Spark like many machine learning tools can
directly be integrated with us.



==
==

 Hi Kylin developers.



HBase has been Kylin’s storage engine since the first day; Kylin on
HBase

has been verified as a success which can support low latency & high

concurrency queries on a very large data scale. Thanks to HBase, most
Kylin

users can get on average less than 1-second query response.



But we also see some limitations when putting Cubes into HBase; I shared

some of them in the HBaseConf Asia 2018[1] this August. The typical

limitations include:



   - Rowkey is the primary index, no secondary index so far;



Filtering by row key’s prefix and suffix can get very different
performance

result. So the user needs to do a good design about the row key;
otherwise,

the query would be slow. This is difficult sometimes because the user
might

not predict the filtering patterns ahead of cube design.



   - HBase is a key-value instead of a columnar storage



Kylin combines multiple measures (columns) into fewer column families
for

smaller data size (row key size is remarkable). This causes HBase often

needing to read more data than requested.



   - HBase couldn't run on YARN



This makes the deployment and auto-scaling a little complicated,
especially

in the cloud.



In one word, HBase is complicated to be Kylin’s storage. The
maintenance,

debugging is also hard for normal developers. Now we’re planning to
seek a

simple, light-weighted, read-only storage engine for Kylin. The new

solution should have the following characteristics:



   - Columnar layout with compression for efficient I/O;

   - Index by each column for quick filtering and seeking;

   - MapReduce / Spark API for parallel processing;

   - HDFS compliant for scalability and availability;

   - Mature, stable and extensible;



With the plugin architecture[2] introduced in Kylin 1.5, adding multiple

storages to Kylin is possible. Some companies like Kyligence Inc and

Meituan.com, have developed their customized storage engine for Kylin in

their product or platform. In their experience, columnar storage is a
good

supplement for the HBase engine. Kaisen Kang from Meituan.com has shared

their KOD (Kylin on Druid) solution[3] in this August’s Kylin meetup in

Beijing.



We plan to do a PoC with Apache Parquet + Apache Spark in the next
phase.

Parquet is a standard columnar file format and has been widely
supported by

many projects like Hive, Impala, Drill, etc. Parquet is adding the page

level column index to support fine-grained filtering.  Apache Spark can

provide the parallel computing over Parquet and can be deployed on

YARN/Mesos and Kubernetes. With this combination, the data persistence
and

computation are separated, which makes the scaling in/out much easier
than

before. Benefiting from Spark's flexibility, we can not only push down
more


Re: LDAP Sync issue - Empty filter; nested exception is javax.naming.directory.InvalidSearchFilterException

2018-10-13 Thread Jiatao Tao
Hi, 
Can you try command '"ldapsearch" to get the users/groups you wanted first? 

---
Regards!
Aron Tao
 
 

On [DATE], "[NAME]" <[ADDRESS]> wrote:

Hi - Please help us to solve the below issue to sync with LDAP

Below is the kylin ldap configuration . 

kylin.security.profile=ldap
kylin.security.acl.admin-role=ROLE_KYLIN-ADMIN-GROUP

kylin.security.ldap.connection-server=ldap://:389
kylin.security.ldap.connection-username=*
kylin.security.ldap.connection-password=*

kylin.security.ldap.user-search-base=OU=,OU=Applications,DC=,DC=com

kylin.security.ldap.user-search-pattern=(&(cn={0}))  --tried with many
options but still same issue as below (CN=*,
OU=Applications,OU=Groups,DC=bcbsfl,DC=com) (uid=*)

#kylin.security.ldap.user-search-filter=CN=*,OU=Hadoop,DC=,DC=com


kylin.security.ldap.user-group-search-base=OU=Requested,OU=Groups,DC=*,DC=com



org.springframework.security.authentication.InternalAuthenticationServiceException:
Empty filter; nested exception is
javax.naming.directory.InvalidSearchFilterException: Empty filter; remaining
name '/'
at

org.springframework.security.ldap.authentication.LdapAuthenticationProvider.doAuthentication(LdapAuthenticationProvider.java:206)
at

org.springframework.security.ldap.authentication.AbstractLdapAuthenticationProvider.authenticate(AbstractLdapAuthenticationProvider.java:85)
at

org.apache.kylin.rest.security.KylinAuthenticationProvider.authenticate(KylinAuthenticationProvider.java:94)
at

org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
at

org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:199)
at

org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:180)
at

org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at

org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at

org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:200)
at

org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at

org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)
at

org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at

org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)
at

org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at

org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at

org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)


--
Sent from: http://apache-kylin.74782.x6.nabble.com/






Re: Apache Kylin Chinese documents updated / Kylin 中文文档已更新

2018-07-08 Thread Jiatao Tao




---

Regards!

Aron Tao







在 2018/7/2 23:20,“ShaoFeng Shi” 写入:



Hi Kylin users,



The documents of Chinese version are updated for Kylin v2.4 and v2.3. More

will be translated in the future.



Latest version:

https://kylin.apache.org/cn/docs/



v2.3:

https://kylin.apache.org/cn/docs23/



Chrome is the recommended browser; You may need to clean the browser cache

to get the new content.



We welcome public contribution on documents as well, regarding how to write

and contribute, please check:



https://kylin.apache.org/development/howto_docs.html



--

Best regards,



Shaofeng Shi 史少锋