Re:[VOTE] Release apache-kylin-4.0.4 (RC1)
+1 (binding) At 2024-01-21 00:12:42, "Li Yang" wrote: >Hi all, > >I have created a build for Apache Kylin 4.0.4, release candidate 1. This is >a very small release aiming to upgrade the versions of dependent components >mainly. > >Changes highlights: > > - Bump commons-fileupload from 1.3.3 to 1.5 > - Bump tomcat-catalina from 8.5.78 to 8.5.86 > - Bump spring-core from 5.2.22.RELEASE to 5.2.23.RELEASE > - Bump scala minor version from 2.12.10 to 2.12.13 > - And a few other bug fixes > >Thanks to everyone who has contributed to this release. > >Apart from the above changes, there are no new features or improvements in >this proposed release. > >The commit to being voted upon: >https://github.com/apache/kylin/commit/37f63b8c22a557bb7f17df370aae9cf2ae640a18 > >The artifacts to be voted on, including the source package and the binary >packages are located here: >https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-4.0.4-rc1/ > >The hash of the artifacts are as the following >- apache-kylin-4.0.4-source-release.zip.sha256: >21b338aae14a71357650b35f473381cf325a9781adf1f6c9954cae9b4027cfe2 >- apache-kylin-4.0.4-bin-spark3.tar.gz.sha256: >77abb6a1174dd7dd63c747c95cbb1da9838c48c0c9dc4c7c35e36933ebb2636e > >A staged Maven repository is available for review at: >https://repository.apache.org/content/repositories/orgapachekylin-1113 > >Release artifacts are signed with my key: >- Fingerprint: CF48 8F24 2BBC 3A88 5DB7 C6DF 685F 5B5D D254 DE89 >- Public source 1: https://people.apache.org/keys/committer/liyang.asc >- Public source 2: >https://keys.openpgp.org/vks/v1/by-fingerprint/CF488F242BBC3A885DB7C6DF685F5B5DD254DE89 > >Please vote on releasing this package as Apache Kylin 4.0.4. > >The vote is open for the next 72 hours and passes if a majority of at least >three +1 PMC votes are cast. > > >[ ] +1 Release this package as Apache Kylin 3.0.2 >[ ] 0 I don't feel strongly about it, but I'm okay with the release >[ ] -1 Do not release this package because... > >Here is my vote: >+1 (binding) > >Best regards, >Li Yang
Re:[VOTE] Release apache-kylin-4.0.0-beta (RC2)
+1 from my side. At 2021-02-03 21:31:34, "Xiaoxiang Yu" wrote: >Hi all, > > > > >I have created a build for Apache Kylin 4.0.0-beta, release candidate 2. >Please note that this release is built on kylin-on-parquet-v2 branch. > > > > >Changes highlights: > > > > >[KYLIN-4857] - Refactor system cube for kylin4 > >[KYLIN-4842] - Supports grouping sets function for Kylin 4 > >[KYLIN-4829] - Support to use thread-level SparkSession to execute query > >[KYLIN-4813] - Refine spark logger for Kylin 4 build engine > >[KYLIN-4858] - Support Kylin4 deployment on CDH 6.X > >[KYLIN-4818] - Calculate cuboid statistics in Kylin 4 > >[KYLIN-4817] - Refine Cube Migration Tool for Kylin4 > > > > >Thanks to everyone who has contributed to this release. > >Here are the release notes: > >https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12348723 > > > > >The commit to being voted upon: > >https://github.com/apache/kylin/commit/546381392c55136310f591969e5a3f3db6074988 > >Its hash is 546381392c55136310f591969e5a3f3db6074988. > > > > >The artifacts to be voted on, including the source package and one >pre-compiled binary packages are located here: > >https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-4.0.0-beta-rc2/ > > > > >The hash of the artifacts are as follows: > >apache-kylin-4.0.0-beta-source-release.zip.sha256 > >d5837ed106c4e10f71a6b7a6324520afdd536c10f5afd86bf6c98d734a3e64a8 > > > > >apache-kylin-4.0.0-beta-bin.tar.gz.sha256 > >e86ab855533104c2eac8fae53dc067d425ab263cebf558c666260a3ceb336b16 > > > > > > > >A staged Maven repository is available for review at: > >https://repository.apache.org/content/repositories/orgapachekylin-1087/ > > > > > > > >Release artifacts are signed with the following key: > >https://people.apache.org/keys/committer/xxyu.asc > > > > >Please vote on releasing this package as Apache Kylin 4.0.0-beta. > > > > >The vote is open for the next 72 hours and passes if a majority of > >at least three +1 binding votes are cast. > > > > >[ ] +1 Release this package as Apache Kylin 4.0.0-beta > >[ ] 0 I don't feel strongly about it, but I'm okay with the release > >[ ] -1 Do not release this package because... > > > > >Here is my vote: > >+1 (binding) >-- >Best wishes to you ! >From :Xiaoxiang Yu
[jira] [Created] (KYLIN-4838) fix KYLIN-4679 bug
chuxiao created KYLIN-4838: -- Summary: fix KYLIN-4679 bug Key: KYLIN-4838 URL: https://issues.apache.org/jira/browse/KYLIN-4838 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4837) optimize CubeMigrationCLI
chuxiao created KYLIN-4837: -- Summary: optimize CubeMigrationCLI Key: KYLIN-4837 URL: https://issues.apache.org/jira/browse/KYLIN-4837 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4836) fix CubeMigrationCLI bug
chuxiao created KYLIN-4836: -- Summary: fix CubeMigrationCLI bug Key: KYLIN-4836 URL: https://issues.apache.org/jira/browse/KYLIN-4836 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re:new committer: Rupeng Wang
Contratulations ! At 2020-10-14 22:25:10, "ShaoFeng Shi" wrote: >The Project Management Committee (PMC) for Apache Kylin has invited Rupeng >Wang (王汝鹏, wangrup...@apache.org) to become a committer and we are pleased >to announce that he has accepted. > >Being a committer enables easier contribution to the project since there is >no need to go via the patch submission process. This should enable better >productivity. > >Congratulations, Rupeng! > >Best regards, > >Shaofeng Shi 史少锋 >Apache Kylin PMC >Email: shaofeng...@apache.org > >Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html >Join Kylin user mail group: user-subscr...@kylin.apache.org >Join Kylin dev mail group: dev-subscr...@kylin.apache.org
Re:[VOTE] Release apache-kylin-3.1.1 (RC1)
+1 . Good job! At 2020-10-14 15:12:15, "Xiaoxiang Yu" wrote: >Hi all, > > > >I have created a build for Apache Kylin 3.1.1, release candidate 1. > > > > > >Changes highlights: > > > >[KYLIN-4612] - Support job status write to kafka > > > >[KYLIN-4712] - Optimize CubeMetaIngester.java CLI > > > >[KYLIN-4657] - dead-loop in >org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork > > > >[KYLIN-4688] - Too many tmp files in HDFS tmp directory > > > >[KYLIN-4619] - Make shrunken dict able to coexist with mr-hive global dict > > > > > > > >Thanks to everyone who has contributed to this release. > > > >Here are the release notes: > > > >https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121=12348354 > > > > > >The commit to being voted upon: > > > >https://github.com/apache/kylin/commit/d8f5b1b40da42401df90f6205e5f650be05c81c4 > > > >Its hash is d8f5b1b40da42401df90f6205e5f650be05c81c4. > > > > > > > >The artifacts to be voted on, including the source package and four > > > >pre-compiled binary packages are located here: > > > >https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-3.1.1-rc1/ > > > > > > > >The hash of the artifacts are as follows: > > > >apache-kylin-3.1.1-source-release.zip.sha256 > >1f4e28dd53e2ef72faf40c3313f6a53d61205000250a57658d45800ad243594a > > > >apache-kylin-3.1.1-bin-hbase1x.tar.gz.sha256 > >23dcc21c3aa3d496afe39749a2e6832e3aeb4cabc83819a283a1468d70248302 > > > >apache-kylin-3.1.1-bin-cdh57.tar.gz.sha256 > >a0d50fb19f11918a9849ab93bd7a6033ae0e8a7fa5ffcfd7c4e8b5889e4b4829 > > > >apache-kylin-3.1.1-bin-cdh60.tar.gz.sha256 > >856cb8e3fbb1a3593121e3ba9c9f5b528ff96d156fd0648fa3ee71804d946283 > > > >apache-kylin-3.1.1-bin-hadoop3.tar.gz.sha256 > >4a0090acaa627e3c2611a1827ab49b822c33a43fc316b26e9efb0a0117031ddf > > > > > >A staged Maven repository is available for review at: > > > >https://repository.apache.org/content/repositories/orgapachekylin-1083/ > > > > > > > >Release artifacts are signed with the following key: > > > >https://people.apache.org/keys/committer/xxyu.asc > > > > > > > >Please vote on releasing this package as Apache Kylin 3.1.1 . > > > > > > > >The vote is open for the next 72 hours and passes if a majority of > > > >at least three +1 binding votes are cast. > > > > > > > >[ ] +1 Release this package as Apache Kylin 3.1.1 > > > >[ ] 0 I don't feel strongly about it, but I'm okay with the release > > > >[ ] -1 Do not release this package because... > > > > > >Here is my vote: > > > >+1 (binding) > > > > >-- > >Best wishes to you ! >From :Xiaoxiang Yu
[jira] [Created] (KYLIN-4728) hive global dict optimize
chuxiao created KYLIN-4728: -- Summary: hive global dict optimize Key: KYLIN-4728 URL: https://issues.apache.org/jira/browse/KYLIN-4728 Project: Kylin Issue Type: Improvement Reporter: chuxiao Attachments: 111.png, .png -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4693) avoid NPE when HDFSPathGarbageCollectionStep run
chuxiao created KYLIN-4693: -- Summary: avoid NPE when HDFSPathGarbageCollectionStep run Key: KYLIN-4693 URL: https://issues.apache.org/jira/browse/KYLIN-4693 Project: Kylin Issue Type: Bug Reporter: chuxiao {code:java} java.lang.NullPointerException at org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:97) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4687) add unify clean sh to excute some clean shells
chuxiao created KYLIN-4687: -- Summary: add unify clean sh to excute some clean shells Key: KYLIN-4687 URL: https://issues.apache.org/jira/browse/KYLIN-4687 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4686) clean metadata support to delete all jobs
chuxiao created KYLIN-4686: -- Summary: clean metadata support to delete all jobs Key: KYLIN-4686 URL: https://issues.apache.org/jira/browse/KYLIN-4686 Project: Kylin Issue Type: Improvement Reporter: chuxiao Sometimes, test jobs are error and nobody is care. We need delete them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4685) return user friendly msg when stackoverflowerror
chuxiao created KYLIN-4685: -- Summary: return user friendly msg when stackoverflowerror Key: KYLIN-4685 URL: https://issues.apache.org/jira/browse/KYLIN-4685 Project: Kylin Issue Type: Improvement Reporter: chuxiao when sql "where in(..)" has too many elements, stackoverflowerror, and return 500 inernalerror. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re:Too many tmp files in HDFS tmp dictionary
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true thanks. At 2020-08-04 11:03:02, "shylinzhang" wrote: Dear all, I found many tmp files in HDFS tmp dictionary, and I checked the kylin conf file and didn’t find any tmp route in them. I found some infos in http://kylin.apache.org/cn/development/about_temp_files.html, https://issues.apache.org/jira/browse/KYLIN-926 Dose it mean need to delete them by hands? I want to know if there is a way to delete them automatically. I am looking forward for your replay, thank you. Kylin version: 2.6.1 Hadoop version: 3.0.0+cdh6.0.0 Best Regards, Shylin Zhang
[jira] [Created] (KYLIN-4679) clean hive table support hive table prefix
chuxiao created KYLIN-4679: -- Summary: clean hive table support hive table prefix Key: KYLIN-4679 URL: https://issues.apache.org/jira/browse/KYLIN-4679 Project: Kylin Issue Type: Improvement Reporter: chuxiao my database has 10,000+ tables. so I need set hive table prefix replace all tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4678) continue execute job when StorageCleanupJob sub step has error
chuxiao created KYLIN-4678: -- Summary: continue execute job when StorageCleanupJob sub step has error Key: KYLIN-4678 URL: https://issues.apache.org/jira/browse/KYLIN-4678 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4677) StorageCleanupJob throw NPE
chuxiao created KYLIN-4677: -- Summary: StorageCleanupJob throw NPE Key: KYLIN-4677 URL: https://issues.apache.org/jira/browse/KYLIN-4677 Project: Kylin Issue Type: Bug Reporter: chuxiao Attachments: D-Chat_20200803140910.png see picture。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4671) modify FetchRunner catch getOutput NPE log level to debug for ignore too many logs
chuxiao created KYLIN-4671: -- Summary: modify FetchRunner catch getOutput NPE log level to debug for ignore too many logs Key: KYLIN-4671 URL: https://issues.apache.org/jira/browse/KYLIN-4671 Project: Kylin Issue Type: Improvement Reporter: chuxiao Attachments: D-Chat_20200729164446.png too many log. Ignore , until remove job info when clean metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re:[DISCUSS] Kylin Parquet storage and 4.0 plan
Will 3.x continue to release?For example,support hbase rsgroup. At 2020-07-24 19:23:11, "ShaoFeng Shi" wrote: Hello, Kylin users, Regarding the Kylin Parquet storage, we hope to update the progress here. At present, we have completed the main development work[1], design document[2], and the benchmark. With the new architecture, Kylin is going to be more efficient and be more cloud-friendly: fully on Spark, less dependency on Hadoop stack, which made the DevOps easier. Here we discuss the future plan, which includes the two aspects. 1. The plan for Kylin 4.0 In Kylin 3.x, we have released some important functions/features, such as real-time analysis, Flink building engine, global dictionary with Hive, etc. In the next phase, we hope to concentrate on the Parquet storage engine and to release it in Kylin v4.0 within this year. In this period, 3.x will be keeping maintained for bug fix and security vulnerability, but won't introduce big change or major features. 2. Backward compatibility for HBase storage. When we develop the Parquet storage engine, we find it is very difficult to make the Parquet and HBase engines co-exist. The codebase becomes very complicated and ugly, inevitably bring big challenges to the maintenance and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the CDHs' are different from the community's'), which makes the testing and release effort be doubled or tripled in the past years. So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin metadata will also migrate to MySQL. For existing users, if you want to use the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the Parquet storage, a migration tool can be provided later (another discuss thread). Welcome to tell us your concerns and suggestions! Thank you for your participation. ## Reference [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2 [2] https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: shaofeng...@apache.org Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: user-subscr...@kylin.apache.org Join Kylin dev mail group: dev-subscr...@kylin.apache.org
Re:Re: Design and implement new metadata cache and boardcast mechanism
But it will introduce new dependencies. Every day mysql has a lots of warning because hive, so I don't want to use mysql. 在 2020-07-22 14:26:36,"Zhou Kang" 写道: >I think it’s a better way to redesign a new metadata store based on relational >DB. > >Other data warehouse use RDBMS as metadata storage, such as Hive us MySQL. > >It is easy to organize data by using RDBMS. > >Maybe it’s a big challenge to swtich the meta storage, But I think the benefit >is great too. > >> 2020年7月22日 上午11:08,Rupeng Wang 写道: >> >> Agreed with Xiaoxiang. I think it's a good proposal. Maintaining system >> availability is important. But we need to verify whether the solution you >> provided is feasible and do more tests. >> >> 在 2020/7/22 10:43,“Xiaoxiang Yu” 写入: >> >>Hi, >>It is a fancy idea from my side. Since it is a major behavior change >> for Kylin core system and may have impact to all components, please provided >> us your detailed design documentation (better in English ^_^). >>I think you may implement your idea and verify if it is works as you >> expect, if your idea is proved works in your production env, maintainer will >> be happy to learn and view your code change. >>Thank you very much and good luck to you. >> >> >> >> >> >> >> >> >> >> >>-- >> >>Best wishes to you ! >>From :Xiaoxiang Yu >> >> >> >> >> >>在 2020-07-22 05:39:49,"chuxiao" 写道: >>> 1.读写锁改成分桶和自旋锁,允许瞬时的脏读。 >>> 2.更新元数据时不再广播自己,即修改元数据的进程。需要靠广播刷新的缓存,在更新操作时同步刷新。 >>> 3.缓存更新细粒度,按最小原子进行更新,考虑记录变更的版本号时间戳 >>> 依次往下修改,直到满足设计目标。 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 在 2020-07-22 05:27:24,"chuxiao" 写道: >>>> kylin当前的缓存机制,比较适合cube数几十上百,不频繁更新的场景。 >>>> 当有2千以上的cube,每天更新几百个,metadata上的读写锁,任何修改全更新的广播机制,导致一旦有持续的元数据更新,整个集群响应时间大幅拉长,建模和查询频繁出现超时。 >>>> KYLIN-4169缓解了这个问题,但还不够。 >>>> 我想重新设计缓存功能,设计目标是在单个项目1万个cube,每天新建/删除一千个的场景下,建模和查询不会出现频繁超时,保持系统可用性。 >> >> >
Re:Re:重新设计metadata缓存的读写锁和广播重加载机制
create jira KYLIN-4654 在 2020-07-22 05:39:49,"chuxiao" 写道: >1.读写锁改成分桶和自旋锁,允许瞬时的脏读。 >2.更新元数据时不再广播自己,即修改元数据的进程。需要靠广播刷新的缓存,在更新操作时同步刷新。 >3.缓存更新细粒度,按最小原子进行更新,考虑记录变更的版本号时间戳 >依次往下修改,直到满足设计目标。 > > > > > > > > > > > > > > > > > >在 2020-07-22 05:27:24,"chuxiao" 写道: >>kylin当前的缓存机制,比较适合cube数几十上百,不频繁更新的场景。 >>当有2千以上的cube,每天更新几百个,metadata上的读写锁,任何修改全更新的广播机制,导致一旦有持续的元数据更新,整个集群响应时间大幅拉长,建模和查询频繁出现超时。 >>KYLIN-4169缓解了这个问题,但还不够。 >>我想重新设计缓存功能,设计目标是在单个项目1万个cube,每天新建/删除一千个的场景下,建模和查询不会出现频繁超时,保持系统可用性。
[jira] [Created] (KYLIN-4654) new metadata read/write and reload mechanism
chuxiao created KYLIN-4654: -- Summary: new metadata read/write and reload mechanism Key: KYLIN-4654 URL: https://issues.apache.org/jira/browse/KYLIN-4654 Project: Kylin Issue Type: Improvement Reporter: chuxiao support when there are 10,000 cube in one cluster one project, and 1000 cubes continue update,still can create cube and query -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re:重新设计metadata缓存的读写锁和广播重加载机制
1.读写锁改成分桶和自旋锁,允许瞬时的脏读。 2.更新元数据时不再广播自己,即修改元数据的进程。需要靠广播刷新的缓存,在更新操作时同步刷新。 3.缓存更新细粒度,按最小原子进行更新,考虑记录变更的版本号时间戳 依次往下修改,直到满足设计目标。 在 2020-07-22 05:27:24,"chuxiao" 写道: >kylin当前的缓存机制,比较适合cube数几十上百,不频繁更新的场景。 >当有2千以上的cube,每天更新几百个,metadata上的读写锁,任何修改全更新的广播机制,导致一旦有持续的元数据更新,整个集群响应时间大幅拉长,建模和查询频繁出现超时。 >KYLIN-4169缓解了这个问题,但还不够。 >我想重新设计缓存功能,设计目标是在单个项目1万个cube,每天新建/删除一千个的场景下,建模和查询不会出现频繁超时,保持系统可用性。
重新设计metadata缓存的读写锁和广播重加载机制
kylin当前的缓存机制,比较适合cube数几十上百,不频繁更新的场景。 当有2千以上的cube,每天更新几百个,metadata上的读写锁,任何修改全更新的广播机制,导致一旦有持续的元数据更新,整个集群响应时间大幅拉长,建模和查询频繁出现超时。 KYLIN-4169缓解了这个问题,但还不够。 我想重新设计缓存功能,设计目标是在单个项目1万个cube,每天新建/删除一千个的场景下,建模和查询不会出现频繁超时,保持系统可用性。
[jira] [Created] (KYLIN-4642) load hive env classpath when only kylin first start
chuxiao created KYLIN-4642: -- Summary: load hive env classpath when only kylin first start Key: KYLIN-4642 URL: https://issues.apache.org/jira/browse/KYLIN-4642 Project: Kylin Issue Type: Improvement Reporter: chuxiao see pr -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4635) set org.apache.kylin default log level info
chuxiao created KYLIN-4635: -- Summary: set org.apache.kylin default log level info Key: KYLIN-4635 URL: https://issues.apache.org/jira/browse/KYLIN-4635 Project: Kylin Issue Type: Improvement Reporter: chuxiao In kylin-server-log4j.properties and kylin-tools-log4j.properties, log4j.logger.org.apache.kylin=DEBUG,file Only version is less than 1.0, or new feature package use debug level for default. Could we change default log level to INFO? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re:jdbc connection not working.
mark sure you can connection the "EMR cluster hostname":7070, maybe network can not connection. At 2020-07-13 11:58:34, "xatax" wrote: >I have attempted to connect to Kylin using jdbc driver for the following >Kylin versions 2.6, 3.0.2 and 3.1 and have been unable to make connection. > >The driver files I have used are from the '$KYLIN_HOME/lib' directory: >for version 3.1: kylin-jdbc-3.1.0.jar, jcl-over-slf4j-1.7.21.jar, >slf4j-api-1.7.21.jar >for version 2.6.6: kylin-jdbc-2.6.6.jar >for version 3.0.2: kylin-jdbc-3.0.2.jar > >Connection URL I am using: >jdbc:kylin://"EMR cluster hostname":7070/learn_kylin >JDBC Driver class: org.apache.kylin.jdbc.Driver > >Will appreciate any insight into what might be going wrong. > >Error logs: > >38f9d37464b4% javac ky.java >38f9d37464b4% java ky >Connecting to database... >SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". >SLF4J: Defaulting to no-operation (NOP) logger implementation >SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further >details. >java.sql.SQLException: >org.apache.kylin.jdbc.shaded.org.apache.http.conn.HttpHostConnectException: >Connection to http://ec2-3-236-65-136.compute-1.amazonaws.com:7070 refused > at org.apache.kylin.jdbc.KylinConnection.(KylinConnection.java:72) > at >org.apache.kylin.jdbc.KylinJdbcFactory.newConnection(KylinJdbcFactory.java:77) > at >org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:247) > at ky.main(ky.java:22) >Caused by: >org.apache.kylin.jdbc.shaded.org.apache.http.conn.HttpHostConnectException: >Connection to http://ec2-3-236-65-136.compute-1.amazonaws.com:7070 refused > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > at org.apache.kylin.jdbc.KylinClient.connect(KylinClient.java:285) > at org.apache.kylin.jdbc.KylinConnection.(KylinConnection.java:70) > ... 5 more >Caused by: java.net.ConnectException: Operation timed out (Connection timed >out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at >java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at >java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at >java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:606) > at >org.apache.kylin.jdbc.shaded.org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) > at >org.apache.kylin.jdbc.shaded.org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) > ... 13 more >Goodbye! >38f9d37464b4% javac ky.java >38f9d37464b4% java ky >Connecting to database... >SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". >SLF4J: Defaulting to no-operation (NOP) logger implementation >SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further >details. >java.sql.SQLException: java.net.ConnectException: Operation timed out >(Connection timed out) > at org.apache.kylin.jdbc.KylinConnection.(KylinConnection.java:72) > at >org.apache.kylin.jdbc.KylinJdbcFactory.newConnection(KylinJdbcFactory.java:77) > at >org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:247) > at ky.main(ky.java:22) >Caused by: java.net.ConnectException: Operation timed out (Connection timed >out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at >java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at
why org.apache.kylin default log level is DEBUG?
In kylin-server-log4j.properties and kylin-tools-log4j.properties, log4j.logger.org.apache.kylin=DEBUG,file Only version is less than 1.0, or new feature package use debug level for default. Could we change default log level to INFO?
[jira] [Created] (KYLIN-4626) add set kylin home sh
chuxiao created KYLIN-4626: -- Summary: add set kylin home sh Key: KYLIN-4626 URL: https://issues.apache.org/jira/browse/KYLIN-4626 Project: Kylin Issue Type: Improvement Reporter: chuxiao KYLIN_HOME 是重要的,几乎每个脚本都离不开它。但随便设置环境变量并不是一个最佳行为,比如安装了多套实例。增加set-kylin-home.sh, kylin实例可以设置自己的环境变量 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4614) Broadcaster node exclude itself
chuxiao created KYLIN-4614: -- Summary: Broadcaster node exclude itself Key: KYLIN-4614 URL: https://issues.apache.org/jira/browse/KYLIN-4614 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Reporter: chuxiao Assignee: Yaqian Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4613) add buildCubeCLi as hadoop main class and jobRestClient
chuxiao created KYLIN-4613: -- Summary: add buildCubeCLi as hadoop main class and jobRestClient Key: KYLIN-4613 URL: https://issues.apache.org/jira/browse/KYLIN-4613 Project: Kylin Issue Type: Improvement Components: Client - CLI Reporter: chuxiao support submit job and wait finish. retry 3 times when error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4612) job status write to kafka
chuxiao created KYLIN-4612: -- Summary: job status write to kafka Key: KYLIN-4612 URL: https://issues.apache.org/jira/browse/KYLIN-4612 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: chuxiao because more than hundrad job running , so job status changed write to kafka instread of query job list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4611) modify PATTERN_SPARK_APP_URL to Tracking URL
chuxiao created KYLIN-4611: -- Summary: modify PATTERN_SPARK_APP_URL to Tracking URL Key: KYLIN-4611 URL: https://issues.apache.org/jira/browse/KYLIN-4611 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: chuxiao stdout: 20/06/29 21:00:05 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead. Application Id: application_1582793079899_88034478, Tracking URL: http://x:8088/proxy/application_1582793079899_88034478/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4610) updatet kylin.engine.livy.backtick.quote default
chuxiao created KYLIN-4610: -- Summary: updatet kylin.engine.livy.backtick.quote default Key: KYLIN-4610 URL: https://issues.apache.org/jira/browse/KYLIN-4610 Project: Kylin Issue Type: Bug Components: Job Engine Reporter: chuxiao KYLIN-3905 , set kylin.engine.livy.backtick.quote is "", but need "`" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4609) setenv.sh add zgc config for big memory
chuxiao created KYLIN-4609: -- Summary: setenv.sh add zgc config for big memory Key: KYLIN-4609 URL: https://issues.apache.org/jira/browse/KYLIN-4609 Project: Kylin Issue Type: Improvement Reporter: chuxiao if jdk >= 13, and memory >= 50g, use zgc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4608) add deletecubefast api for delete 300 cubes fast
chuxiao created KYLIN-4608: -- Summary: add deletecubefast api for delete 300 cubes fast Key: KYLIN-4608 URL: https://issues.apache.org/jira/browse/KYLIN-4608 Project: Kylin Issue Type: Improvement Components: REST Service Affects Versions: v3.0.0-alpha Reporter: chuxiao For delete 300 cubes fast , doing not clean segment storage. Do clean when cleantool running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4607) destributed scheduler avoid change job status when fetcherRunner error
chuxiao created KYLIN-4607: -- Summary: destributed scheduler avoid change job status when fetcherRunner error Key: KYLIN-4607 URL: https://issues.apache.org/jira/browse/KYLIN-4607 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao see KYLIN-4250 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4606) add logs when query olap and pushdown botherror
chuxiao created KYLIN-4606: -- Summary: add logs when query olap and pushdown botherror Key: KYLIN-4606 URL: https://issues.apache.org/jira/browse/KYLIN-4606 Project: Kylin Issue Type: Improvement Reporter: chuxiao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4605) HiveProducer write metrics less when one host running some kylin servers
chuxiao created KYLIN-4605: -- Summary: HiveProducer write metrics less when one host running some kylin servers Key: KYLIN-4605 URL: https://issues.apache.org/jira/browse/KYLIN-4605 Project: Kylin Issue Type: Improvement Components: Metrics Affects Versions: v3.0.0-beta, v3.0.0-alpha Reporter: chuxiao Fails to write metrics to file becase file lease: hdfs:///kday_date=2020-01-14/bigdata-kylin-shuyi3-00.gz01.diditaxi.com-part- due to org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /user/prod_kylin/bigdata_kylin/hive/bigdata_kylin/hive_metrics_query_prod3/kday_date=2020-01-14/bigdata-kylin-shuyi3-00.gz01.diditaxi.com-part- for DFSClient_NONMAPREDUCE_-1428991477_29 on 100.69.76.32 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-312505276_29 on 100.69.76.32 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2700) at org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:118) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2735) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:842) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:493) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2717) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4603) listjob return NPE
chuxiao created KYLIN-4603: -- Summary: listjob return NPE Key: KYLIN-4603 URL: https://issues.apache.org/jira/browse/KYLIN-4603 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao Attachments: D-Chat_20200610181627.png 如题,这是cache刷新时短暂不一致导致,要跳过 Caused by: java.lang.NullPointerException at org.apache.kylin.rest.service.JobService$22.apply(JobService.java:1162) at org.apache.kylin.rest.service.JobService$22.apply(JobService.java:1156) at com.google.common.base.Predicates$AndPredicate.apply(Predicates.java:343) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:702) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Lists.newArrayList(Lists.java:144) at com.google.common.collect.Lists.newArrayList(Lists.java:125) at org.apache.kylin.rest.service.JobService.innerSearchCubingJobsV2(JobService.java:1120) at org.apache.kylin.rest.service.JobService.innerSearchCubingJobsV2(JobService.java:1064) at org.apache.kylin.rest.service.JobService.searchJobsByCubeNameV2(JobService.java:1034) at org.apache.kylin.rest.service.JobService.searchJobsV2(JobService.java:996) at org.apache.kylin.rest.controller.JobController.list(JobController.java:86) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4250) FechRunnner should skip the job to process other jobs instead of throwing exception when the job section metadata is not found
chuxiao created KYLIN-4250: -- Summary: FechRunnner should skip the job to process other jobs instead of throwing exception when the job section metadata is not found Key: KYLIN-4250 URL: https://issues.apache.org/jira/browse/KYLIN-4250 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao problem: Our cluster has two nodes (named build1, build2) building cube jobs, and used DistributedScheduler. There is a job, id 9f05b84b-cec9-81ee-9336-5a419e451a55, shown built on the build1 node. The job displays Error, but the first sub task creating hive flat table display Ready, and can see the first task's yarn job running through yarn ui. After the yarn job is successful, the job re-runs the first sub-task, again and again. log: Looking at the build1 log, the status of this job is changed from ready to running, then the first task status is ready to running, then the update job information is broadcast, then the update job information broadcast is received. But after twenty seconds, a broadcast of the updated job information was received. After a few minutes, the first task is completed, but the log shows that the job status changed from Error to ready! Then the job status changed from ready to running, the first task starts running again Repeat the above log. I suspect that other nodes have changed the job status. Looking at the build2 node log, there are a lot of exception logs, about there is no output for another job id f1b2024a-e6ed-3dd5-5a7d-7c267ead5f1d: {code:java} 2019-09-20 14:20:58,825 WARN [pool-10-thread-1] threadpool.DefaultFetcherRunner:90 : Job Fetcher caught a exception java.lang.IllegalArgumentException: there is no related output for job id:f1b2024a-e6ed-3dd5-5a7d-7c267ead5f1d at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) at org.apache.kylin.job.execution.ExecutableManager.getOutputDigest(ExecutableManager.java:184) at org.apache.kylin.job.impl.threadpool.DefaultFetcherRunner.run(DefaultFetcherRunner.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} In addition, each build2 receives the broadcast of the build1 update the job information, after twenty seconds, the log print changes the first task state runinng to ready and broadcasts. Restarting the build2 node, not printing the Job Fetcher caught a exception , and the job 9f05b84b-cec9-81ee-9336-5a419e451a55 was successfully executed. analysis This is due to a job metadata synchronization exception, which triggers a job scheduling bug. Build1 node try to run the job, but another build node kills the job and changes the job status to Error, causing problems. The build2 node may have a metadata synchronization problem, the job with the id f1b2024a-e6ed-3dd5-5a7d-7c267ead5f1d exists in ExecutableDao's executableDigestMap, and does not exist in ExecutableDao's executableOutputDigestMap. Each time FetchRunner foreach the job, it throws an exception and fetchFailed is set to true. {code:java} DefaultFetcherRunner: //throw exception final Output outputDigest = getExecutableManger().getOutputDigest(id); . . . } catch (Throwable th) { fetchFailed = true; // this could happen when resource store is unavailable logger.warn("Job Fetcher caught a exception ", th); } {code} When the build2 first processes the job that build1 is running, since fetchFailed is true, the job is not in the list of running jobs in build2, the job status is running, FetchRunner.jobStateCount() will kill the job, and set the running task status to ready, set the job status to error, broadcast. {code:java} FetchRunner.jobStateCount(): protected void jobStateCount(String id) { final Output outputDigest = getExecutableManger().getOutputDigest(id); // logger.debug("Job id:" + id + " not runnable"); if (outputDigest.getState() == ExecutableState.SUCCEED) { nSUCCEED++; } else if (outputDigest.getState() == ExecutableState.ERROR) { nError++; } else if (outputDigest.getState() == ExecutableState.DISCARDED) { nDiscarded++; } else if (outputDigest.getStat
[jira] [Created] (KYLIN-4211) PartitionDesc support custom year、month、day partitions name
chuxiao created KYLIN-4211: -- Summary: PartitionDesc support custom year、month、day partitions name Key: KYLIN-4211 URL: https://issues.apache.org/jira/browse/KYLIN-4211 Project: Kylin Issue Type: Improvement Reporter: chuxiao YearMonthDayPartitionConditionBuilder supports partition named year, month, day, but can not support partition names like Y, M, D. Because there are user using fact table partitioned by Y, M, D, so add CustomYearMonthDayFieldPartitionConditionBuilder, support custom year, month, day partitions name. Partition metadata in model.json like: { "uuid" : "459d48c1-a8a6-cdf5-6ea7-e2ae48b248e9", "last_modified" : 1571652918478, "version" : "2.6.0.20500", "name" : "kylin_sales_ymd", "owner" : "admin", "is_draft" : false, "description" : "", "fact_table" : "BIGDATA_KYLIN.KYLIN_SALES_YMD", "lookups" : [ ], "dimensions" : [ { "table" : "KYLIN_SALES_YMD", "columns" : [ "LEAF_CATEG_ID", "TRANS_ID", "SLR_SEGMENT_CD", "SELLER_ID", "BUYER_ID", "OPS_USER_ID", "OPS_REGION", "Y", "M", "D" ] } ], "metrics" : [ "KYLIN_SALES_YMD.PRICE", "KYLIN_SALES_YMD.ITEM_COUNT" ], "filter_condition" : "", "partition_desc" : { "partition_date_column" : "KYLIN_SALES_YMD.Y, KYLIN_SALES_YMD.M, KYLIN_SALES_YMD.D", "partition_time_column" : null, "partition_date_start" : 0, "partition_date_format" : "-MM-dd", "partition_time_format" : "HH:mm:ss", "partition_type" : "APPEND", "partition_condition_builder" : "org.apache.kylin.metadata.model.PartitionDesc$CustomYearMonthDayFieldPartitionConditionBuilder" }, "capacity" : "MEDIUM" } Partition_date_column is the year, month, and day partition field split by ','., and Partition_condition_builder is org.apache.kylin.metadata.model.PartitionDesc$CustomYearMonthDayFieldPartitionConditionBuilder Web can create a normal model and "Edit JSON" on Kylin web -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4210) Support the custom year, month, and day field name partition for the fact table
chuxiao created KYLIN-4210: -- Summary: Support the custom year, month, and day field name partition for the fact table Key: KYLIN-4210 URL: https://issues.apache.org/jira/browse/KYLIN-4210 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao YearMonthDayPartitionConditionBuilder supports partitions named year, month, day, but can not supportl other names like Y, M, D. Because there are user using fact table partitioned by Y、 M、 D, so add CustomYearMonthDayFieldPartitionConditionBuilder, support custom year, month, day partition name. Partition metadata in model.json like: { "uuid" : "459d48c1-a8a6-cdf5-6ea7-e2ae48b248e9", "last_modified" : 1571652918478, "version" : "2.6.0.20500", "name" : "kylin_sales_ymd", "owner" : "admin", "is_draft" : false, "description" : "", "fact_table" : "BIGDATA_KYLIN.KYLIN_SALES_YMD", "lookups" : [ ], "dimensions" : [ { "table" : "KYLIN_SALES_YMD", "columns" : [ "LEAF_CATEG_ID", "TRANS_ID", "SLR_SEGMENT_CD", "SELLER_ID", "BUYER_ID", "OPS_USER_ID", "OPS_REGION", "Y", "M", "D" ] } ], "metrics" : [ "KYLIN_SALES_YMD.PRICE", "KYLIN_SALES_YMD.ITEM_COUNT" ], "filter_condition" : "", "partition_desc" : { "partition_date_column" : "KYLIN_SALES_YMD.Y, KYLIN_SALES_YMD.M, KYLIN_SALES_YMD.D", "partition_time_column" : null, "partition_date_start" : 0, "partition_date_format" : "-MM-dd", "partition_time_format" : "HH:mm:ss", "partition_type" : "APPEND", "partition_condition_builder" : "org.apache.kylin.metadata.model.PartitionDesc$CustomYearMonthDayFieldPartitionConditionBuilder" }, "capacity" : "MEDIUM" } Partition_date_column is the year, month, and day partition field split by ','. Partition_condition_builder is org.apache.kylin.metadata.model.PartitionDesc$CustomYearMonthDayFieldPartitionConditionBuilder -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4205) configuration support set hive.intermediate-table-prefix
chuxiao created KYLIN-4205: -- Summary: configuration support set hive.intermediate-table-prefix Key: KYLIN-4205 URL: https://issues.apache.org/jira/browse/KYLIN-4205 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao There are thousands of cubes in didi, so in order to cut the metadata we have 4 kylin clusters. They use 1 hdfs cluster, 1 hive database, and 1 hbase cluster. KYLIN_INTERMEDIATE_PREFIX is set by MetadataConstants. In order to the unused intermediate table can be properly cleaned up by cleanTool, the intermediate table prefix needs to be set differently by kylin config. kylin config add kylin.source.hive.intermediate-table-prefix, default 'kylin_intermediate_'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4190) hiveproducer send data excetpion because hive mertics table location path prefix is different with defaut fs when hdfs uses router-based federation
chuxiao created KYLIN-4190: -- Summary: hiveproducer send data excetpion because hive mertics table location path prefix is different with defaut fs when hdfs uses router-based federation Key: KYLIN-4190 URL: https://issues.apache.org/jira/browse/KYLIN-4190 Project: Kylin Issue Type: Bug Components: Metrics Affects Versions: v3.0.0-alpha Reporter: chuxiao Our hdfs cluster uses router-based federation(https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html). Opening the system cube configuration, hiveproducer write() function throw excetpion : {code:java} //代码占位符 ERROR [metrics-blocking-reservoir-scheduler-0] hive.HiveReservoirReporter:119 : Wrong FS: hdfs://DClusterNmg4/user/kylin/hive/hive_metrics_job_exception_qa/kday_date=2019-09-04, expected: hdfs://difed java.lang.IllegalArgumentException: Wrong FS: hdfs://DClusterNmg4/user/kylin/hive/hive_metrics_job_exception_qa/kday_date=2019-09-04, expected: hdfs://difed at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:717) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:109) at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1390) at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1386) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1402) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1494) at org.apache.kylin.metrics.lib.impl.hive.HiveProducer.write(HiveProducer.java:137) at org.apache.kylin.metrics.lib.impl.hive.HiveProducer.send(HiveProducer.java:122) at org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter$HiveReservoirListener.onRecordUpdate(HiveReservoirReporter.java:117) at org.apache.kylin.metrics.lib.impl.BlockingReservoir.notifyListenerOfUpdatedRecord(BlockingReservoir.java:105) at org.apache.kylin.metrics.lib.impl.BlockingReservoir.onRecordUpdate(BlockingReservoir.java:93) at org.apache.kylin.metrics.lib.impl.BlockingReservoir.access$300(BlockingReservoir.java:33) at org.apache.kylin.metrics.lib.impl.BlockingReservoir$ReporterRunnable.run(BlockingReservoir.java:152) at java.lang.Thread.run(Thread.java:745) {code} This is because the default router namespace is hdfs://difed, and the actual federation namespaces are the hdfs://DClusterNmg4, the hdfs://DClusterNmg1, and the hdfs://DClusterNmg2... So fs.defaultFS in core-sie.xml is hdfs ://difed, But this hive table location path is hdfs://DClusterNmg4/user/... . Then defaultFs.exists(hiveLocationPath) throw exception. So we need to check if the prefix is same. If defaut fs is not a prefix of hive table location path, use location path get a new filesystem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4163) CreateFlatHiveTableStep has not yarn app url when hive job running
chuxiao created KYLIN-4163: -- Summary: CreateFlatHiveTableStep has not yarn app url when hive job running Key: KYLIN-4163 URL: https://issues.apache.org/jira/browse/KYLIN-4163 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v3.0.0-alpha Reporter: chuxiao Attachments: flathivetablerunning图.jpg CreateFlatHiveTableStep has yarn app url on the monitor web page only when job finished, but SparkExecutable has yarn app url when job running. this is because of SparkExecutable`s logger has logger listener: {code:java} final PatternedLogger patternedLogger = new PatternedLogger(logger, new PatternedLogger.ILogListener() { @Override public void onLogEvent(String infoKey, Map info) { // only care three properties here if (ExecutableConstants.SPARK_JOB_ID.equals(infoKey) || ExecutableConstants.YARN_APP_ID.equals(infoKey) || ExecutableConstants.YARN_APP_URL.equals(infoKey)) { getManager().addJobInfo(getId(), info); } } });{code} sometimes creating flat hive table hangs, so user wants to have yarn app url when hive job running like attachment. -- This message was sent by Atlassian Jira (v8.3.2#803003)