[jira] [Created] (KYLIN-1318) enable gc log for kylin server instance

2016-01-14 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-1318: - Summary: enable gc log for kylin server instance Key: KYLIN-1318 URL: https://issues.apache.org/jira/browse/KYLIN-1318 Project: Kylin Issue Type: Improvement

Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread hongbin ma
I had a impression that asf git is not well integrated with github, so for a long time we tried not to use github. btw, why do projects like hadoop,hbase not to use github for reviewing? -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone

Re: Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread 250635...@qq.com
No idea why hadoop and hbase community not utilize github. But spark community usually use github to send pr and patches. Maybe more flexible to review and merge. 250635...@qq.com From: hongbin ma Date: 2016-01-14 16:43 To: dev Subject: Re: Using apache reviewboard for reviewing patches I

Re: Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread hongbin ma
good point in this case we should think about trying out both review ways, and pick whichever suits us:) On Thu, Jan 14, 2016 at 4:45 PM, 250635...@qq.com <250635...@qq.com> wrote: > No idea why hadoop and hbase community not utilize github. But spark > community usually use github > to send pr

Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread ShaoFeng Shi
The cube build performance is much determined by your Hadoop cluster's capacity. You can do some inspection with the MR job's statistics to analysis the potential bottlenecks. 2016-01-15 7:19 GMT+08:00 zhong zhang : > Hi All, > > We are trying to build a nine-dimension

Kylin and Tableau -- Top N query

2016-01-14 Thread sdangi
Results from Kylin and Tableau on a live connection don't match. Any reason? I'm creating a custom data source (Custom SQL Query) in Tableau and adding a parameter control using a query similar to below: SELECT t2.c1 ,sum(t1.c2) AS c3 FROM t1 Inner join t2 on t1.k1 = t2.k1 group by t2.c1 order

beg suggestions to speed up the Kylin cube build

2016-01-14 Thread zhong zhang
Hi All, We are trying to build a nine-dimension cube: eight mandatory dimensions and one hierarchy dimension. The fact table is like 20G. Two lookup tables are 1.3M and 357k separately. It takes like 3 hours to go to 30% progress which is kind of slow. We'd like to know are there suggestions to

Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread Yerui Sun
hongbin, I understand how the number of reducers is determined, and it could be improved. Supposed that we got 100GB data after cuboid building, and with setting that 10GB per region. For now, 10 split keys was calculated, and 10 region created, 10 reducer used in ‘convert to hfile’ step.

[jira] [Created] (KYLIN-1319) Find a better way to check hadoop job status

2016-01-14 Thread liyang (JIRA)
liyang created KYLIN-1319: - Summary: Find a better way to check hadoop job status Key: KYLIN-1319 URL: https://issues.apache.org/jira/browse/KYLIN-1319 Project: Kylin Issue Type: Improvement

Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread hongbin ma
hi, yerui, the reason why the number of "convert to hfile" reducers is small is because each region's output will become a htable region. Too many regions will be a burden to hbase cluster. In our production env we have cubes that are 10T+, guess how many regions will it populate? What's more

Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread hongbin ma
I'm not sure if it will work, does hbase bulk load allow that?​ On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun wrote: > hongbin, > > I understand how the number of reducers is determined, and it could be > improved. > > Supposed that we got 100GB data after cuboid building, and

Re: Kylin and Tableau -- Top N query

2016-01-14 Thread ShaoFeng Shi
How much difference between Hive and Kylin? Did you check some factors like: a) any filtering condition in Cube descriptor? b) is the Cube built with the full date range of hive table? c) Was the fact/lookup table data changed since cube be built? Just some hints to exclude those mistakes.

Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread ShaoFeng Shi
For Meng's case, write 5GB takes 40 minutes, that's really slow. The bottleneck should be on HDFS write (cuboid has been calculated, just convert to HFile format in that step, no calculation and others). 2016-01-15 15:36 GMT+08:00 hongbin ma : > if it works I'd love to see