[jira] [Commented] (KYLIN-2565) Upgrade Kylin to Hadoop3.0

2017-10-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205440#comment-16205440
 ] 

SammiChen commented on KYLIN-2565:
--

Thanks Cheng for reporting the issue to track the support of Hadoop 3.0 & EC 
for Apache Kylin.

Hadoop 3.0-beta1 was recently released. 3.0 GA will be the next milestone and 
happen soon. With Hadoop 3.0 new feature HDFS-EC, massive storage space(for 
example 50%) can be saved using this new technology.  Apache Kylin consumes 
large volume of HDFS data and could generate 20% more data onto HDFS after cube 
computing in some cases, therefore HDFS EC should have good opportunities to 
optimize the storage cost and even performance. 

Discussed with Luke, we’d like to collaborate with his team working on this 
support. Here is the rough plan:

1)  Verify Apahce Kylin stack works with Hadoop 3.0 and EC
   Build and run functional tests. The Kylin related issues will be 
reported to Kylin community and all Hadoop EC related issues will go to Hadoop 
community;
2)   Benchmark and report
  Given the functional tests passed, we’ll benchmark Kylin over Hadoop 3.0. 
 Which Kylin workloads to use could be discussed here, and we’d also like to 
share the results. 

Any comments? Thanks for your thoughts!


> Upgrade Kylin to Hadoop3.0
> --
>
> Key: KYLIN-2565
> URL: https://issues.apache.org/jira/browse/KYLIN-2565
> Project: Kylin
>  Issue Type: New Feature
>Reporter: Wang Cheng
>
> Hadoop3.0-alpha is released, Kylin should also keep compatible with it. Below 
> is the Hadoop3.0 components requirements:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread zhengzfand (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengzfand updated KYLIN-2937:
--
Description: 
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).
For nonpartition cube will remain  dictionary files which store in hdfs ,as 
kylin cube building job may 
load all dictionary files , this may make jvm  heap out of memory.

  was:
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).
For nonpartition cube will remain  dictory files which store in hdfs ,as kylin 
cube building job may 
load all dictory file


> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).
> For nonpartition cube will remain  dictionary files which store in hdfs ,as 
> kylin cube building job may 
> load all dictionary files , this may make jvm  heap out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread zhengzfand (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengzfand updated KYLIN-2937:
--
Description: 
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).
For nonpartition cube will remain  dictory files which store in hdfs ,as kylin 
cube building job may 
load all dictory file

  was:
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).
For nonpartition cube will remain 


> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).
> For nonpartition cube will remain  dictory files which store in hdfs ,as 
> kylin cube building job may 
> load all dictory file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread zhengzfand (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengzfand updated KYLIN-2937:
--
Description: 
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).
For nonpartition cube will remain 

  was:
非分区的cube构建之后,中间数据不会被清理.
存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
有可能导致内存溢出(如果字典文件够大够多的话).


> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).
> For nonpartition cube will remain 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2892) Make Kylin compile with Java 9

2017-10-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated KYLIN-2892:
--
Description: 
When I attempted building with Java 9, I got:
{code}
[ERROR] Failed to execute goal on project kylin-engine-mr: Could not resolve 
dependencies for project org.apache.kylin:kylin-engine-mr:jar:2.2.0-SNAPSHOT: 
Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path 
/jdk-9/../lib/tools.jar -> [Help 1]
{code}
The dependency seems to come from kylin-engine-mr module.

  was:
When I attempted building with Java 9, I got:
{code}
[ERROR] Failed to execute goal on project kylin-engine-mr: Could not resolve 
dependencies for project org.apache.kylin:kylin-engine-mr:jar:2.2.0-SNAPSHOT: 
Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path 
/jdk-9/../lib/tools.jar -> [Help 1]
{code}

The dependency seems to come from kylin-engine-mr module.


> Make Kylin compile with Java 9
> --
>
> Key: KYLIN-2892
> URL: https://issues.apache.org/jira/browse/KYLIN-2892
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>
> When I attempted building with Java 9, I got:
> {code}
> [ERROR] Failed to execute goal on project kylin-engine-mr: Could not resolve 
> dependencies for project org.apache.kylin:kylin-engine-mr:jar:2.2.0-SNAPSHOT: 
> Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path 
> /jdk-9/../lib/tools.jar -> [Help 1]
> {code}
> The dependency seems to come from kylin-engine-mr module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread zhengzfand (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205405#comment-16205405
 ] 

zhengzfand commented on KYLIN-2937:
---

Clean up job only drop unuserd htable and  hdfsFiles , method is 
cleanUnusedHBaseTables , cleanUnusedHdfsFiles , 
cleanUnusedIntermediateHiveTable.
The dict row is record in metatable  that will not clean up by cleanup job. For 
partition cube, 
cube segments will  merge by 7/30 days and dict file will  also merge into one 
file . But  nonpartition cube 
does not have the merge step , and dict record will remain until user delete it 
.

> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2017-10-15 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205374#comment-16205374
 ] 

kangkaisen commented on KYLIN-2180:
---

HI, julian.

I don't see anywhere I change the ACL in this patch. Could you point out the 
concrete code ?

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.0.0
>
> Attachments: KYLIN-2180-refactor-ProjectRequest.patch, 
> KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-1892) merge interval support

2017-10-15 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205366#comment-16205366
 ] 

Yang Hao commented on KYLIN-1892:
-

Is there a plan to do it?

> merge interval support
> --
>
> Key: KYLIN-1892
> URL: https://issues.apache.org/jira/browse/KYLIN-1892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.2
>Reporter: fengYu
>Assignee: fengYu
>
> We always has some data need to be amended some days later
> in current kylin, once I set Auto Merge Thresholds, the segment newly build 
> will merge if reach Thresholds, the next day refresh will refresh merged 
> segemnt, which is unnecessary.
> So I want to add a interval configuration means auto merge will merge 
> segments outside of the interval. 
> for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is 
> built, auto merge will not trigger, when 07-09 built success, auto merge will 
> trigger and merge segments from 07-01 to 07-07.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2017-10-15 Thread Pan, Julian (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205363#comment-16205363
 ] 

Pan, Julian commented on KYLIN-2180:


@kangkaisen I found there is a ACL change for create project in your patch.
We allow user create project before your refactor.
Do you think we should keep the same behavior with previous?
Or can you share me the reason to change it?  

> Add project config and make config priority become "cube > project > server"
> 
>
> Key: KYLIN-2180
> URL: https://issues.apache.org/jira/browse/KYLIN-2180
> Project: Kylin
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v2.0.0
>
> Attachments: KYLIN-2180-refactor-ProjectRequest.patch, 
> KYLIN-2180.patch
>
>
> There are cases we want to override global kylin.properties in the scope of a 
> project. E.g. the queue name of Hadoop job.
> Finally, the config priority for Kylin should be "cube > project > server". I 
> think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205150#comment-16205150
 ] 

Ted Yu commented on KYLIN-2937:
---

Please translate subject and description into English.

> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2937) 非分区cube的中间数据会累积

2017-10-15 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205141#comment-16205141
 ] 

Billy Liu commented on KYLIN-2937:
--

Please check the cleanup job log, it should tell why some resources are kept 
from removing. 

> 非分区cube的中间数据会累积
> ---
>
> Key: KYLIN-2937
> URL: https://issues.apache.org/jira/browse/KYLIN-2937
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhengzfand
>
> 非分区的cube构建之后,中间数据不会被清理.
> 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件,
> 有可能导致内存溢出(如果字典文件够大够多的话).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)



[jira] [Commented] (KYLIN-2927) Merge Cuboid Dictionary ERROR

2017-10-15 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205140#comment-16205140
 ] 

Billy Liu commented on KYLIN-2927:
--

First, the code change happens in core-dictionary, it works in kylin-job 
actually, not kylin-tool. 
Second, the proposal workaround is try to build Kylin package from the branch 
2.2.X. 

> Merge Cuboid Dictionary ERROR
> -
>
> Key: KYLIN-2927
> URL: https://issues.apache.org/jira/browse/KYLIN-2927
> Project: Kylin
>  Issue Type: Bug
>Reporter: songxiangjun
>
> when i merge the segment encount  error, how can i solve it. Log as follows:
> 2017-10-11 15:24:56,139 ERROR [pool-9-thread-10] 
> threadpool.DefaultScheduler:145 : ExecuteException 
> job:7508dfa0-5a89-4c3c-8685-701226628207
> org.apache.kylin.job.exception.ExecuteException: 
> org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: Invalid input data. Unordered data cannot be 
> split into multi trees
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:135)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:141)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: Invalid input data. Unordered data cannot be 
> split into multi trees
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:135)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:65)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
>   ... 4 more
> Caused by: java.lang.IllegalStateException: Invalid input data. Unordered 
> data cannot be split into multi trees
>   at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:92)
>   at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:78)
>   at 
> org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.addValue(DictionaryGenerator.java:261)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:79)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:64)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:104)
>   at 
> org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:275)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:146)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:136)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:68)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
>   ... 6 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2722) Introduce a new measure, called active reservoir, for actively pushing metrics to reporters

2017-10-15 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205083#comment-16205083
 ] 

liyang commented on KYLIN-2722:
---

Reviewing https://github.com/apache/kylin/pull/77, a few minor 
questions/comments:
* The term "active reservoir" sounds a little strange, because I don't see 
inactive reservoir any where. The "active" is not meaningful without the 
"inactive", just like there is no good without evil.
* Also the term reservoir perhaps a little too big here? For me, it's just a 
buffer holding events before flushing to reporter. I know people like fancy 
word, but the scale is not fitting really. Be free to make a call. Naming shall 
not prevent a nice feature to commit and we can always rename later.

> Introduce a new measure, called active reservoir, for actively pushing 
> metrics to reporters
> ---
>
> Key: KYLIN-2722
> URL: https://issues.apache.org/jira/browse/KYLIN-2722
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: APACHE-KYLIN-2722.patch
>
>
> For many existing metrics frameworks, they focus on maintaining metrics in 
> memory independently for each instance. However, kylin server may consist of 
> multiple instances. Thus we extend existing metrics framework by introducing 
> *active reservoir* to actively push metrics to reporters which will report 
> metrics of its instance to a unified storage. 
> Here we introduced two *active reservoirs*. One is called 
> {{BlockingReservoir}}, which will buffer the metrics. The other is called 
> {{InstantReservoir}}, which owns no buffer and will directly push metrics to 
> reporters.
> Generally, one *active reservoir* can push its metrics to multiple reporters 
> and one reporter can only listen on one *active reservoir*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)