[jira] [Created] (KYLIN-1318) enable gc log for kylin server instance

2016-01-14 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-1318:
-

 Summary: enable gc log for kylin server instance
 Key: KYLIN-1318
 URL: https://issues.apache.org/jira/browse/KYLIN-1318
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread hongbin ma
I had a impression that asf git is not well integrated with github, so for
a long time we tried not to use github.

btw, why do projects like hadoop,hbase not to use github for reviewing?





-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone


Re: Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread 250635...@qq.com
No idea why hadoop and hbase community not utilize github. But spark community 
usually use github 
to send pr and patches. Maybe more flexible to review and merge. 



250635...@qq.com
 
From: hongbin ma
Date: 2016-01-14 16:43
To: dev
Subject: Re: Using apache reviewboard for reviewing patches
I had a impression that asf git is not well integrated with github, so for
a long time we tried not to use github.
 
btw, why do projects like hadoop,hbase not to use github for reviewing?
 
 
 
 
-- 
Regards,
 
*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone


Re: Re: Using apache reviewboard for reviewing patches

2016-01-14 Thread hongbin ma
good point

in this case we should think about trying out both review ways, and pick
whichever suits us:)

On Thu, Jan 14, 2016 at 4:45 PM, 250635...@qq.com <250635...@qq.com> wrote:

> No idea why hadoop and hbase community not utilize github. But spark
> community usually use github
> to send pr and patches. Maybe more flexible to review and merge.
>
>
>
> 250635...@qq.com
>
> From: hongbin ma
> Date: 2016-01-14 16:43
> To: dev
> Subject: Re: Using apache reviewboard for reviewing patches
> I had a impression that asf git is not well integrated with github, so for
> a long time we tried not to use github.
>
> btw, why do projects like hadoop,hbase not to use github for reviewing?
>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone


Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread ShaoFeng Shi
The cube build performance is much determined by your Hadoop cluster's
capacity. You can do some inspection with the MR job's statistics to
analysis the potential bottlenecks.



2016-01-15 7:19 GMT+08:00 zhong zhang :

> Hi All,
>
> We are trying to build a nine-dimension cube:
> eight mandatory dimensions and one hierarchy
> dimension. The fact table is like 20G. Two lookup
> tables are 1.3M and 357k separately. It takes like
> 3 hours to go to 30% progress which is kind of slow.
>
> We'd like to know are there suggestions to speed up
> the Kylin cube build. We got a suggestion from
> a slide said that sort the dimension based on the
> cardinality. Are there any other ways we can try?
>
> We also noticed that only half of the memory and
> half of the CPU are used during the cube build.
> Are there any ways to fully utilize the resource?
>
> Looking forward to hear from you.
>
> Best regards,
> Zhong
>



-- 
Best regards,

Shaofeng Shi


Kylin and Tableau -- Top N query

2016-01-14 Thread sdangi
Results from Kylin and Tableau on a live connection don't match.  Any reason? 
I'm creating a custom data source (Custom SQL Query) in Tableau and adding a
parameter control using a query similar to below:

SELECT
t2.c1
,sum(t1.c2) AS c3
FROM t1
Inner join t2
on t1.k1 = t2.k1
group by t2.c1
order by c3
LIMIT 

t1 (fact) has 130MM rows and t2 (dimension) has 1.7MM

The query shows different Top N records in Tableau as compared to Kylin and
Hive.

Thanks,
Regards,

--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/Kylin-and-Tableau-Top-N-query-tp3250.html
Sent from the Apache Kylin mailing list archive at Nabble.com.


beg suggestions to speed up the Kylin cube build

2016-01-14 Thread zhong zhang
Hi All,

We are trying to build a nine-dimension cube:
eight mandatory dimensions and one hierarchy
dimension. The fact table is like 20G. Two lookup
tables are 1.3M and 357k separately. It takes like
3 hours to go to 30% progress which is kind of slow.

We'd like to know are there suggestions to speed up
the Kylin cube build. We got a suggestion from
a slide said that sort the dimension based on the
cardinality. Are there any other ways we can try?

We also noticed that only half of the memory and
half of the CPU are used during the cube build.
Are there any ways to fully utilize the resource?

Looking forward to hear from you.

Best regards,
Zhong


Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread Yerui Sun
hongbin,

I understand how the number of reducers is determined, and it could be improved.

Supposed that we got 100GB data after cuboid building, and with setting that 
10GB per region. For now, 10 split keys was calculated, and 10 region created, 
10 reducer used in ‘convert to hfile’ step. 

With optimization, we could calculate 100 (or more) split keys, and use all 
them in ‘covert to file’ step, but sampled 10 keys in them to create regions. 
The result is still 10 region created, but 100 reducer used in ‘convert to 
file’ step. Of course, the hfile created is also 100, and load 10 files per 
region. That’s should be fine, doesn’t affect the query performance 
dramatically.

> 在 2016年1月15日,13:53,hongbin ma  写道:
> 
> hi, yerui,
> 
> the reason why the number of "convert to hfile" reducers is small is
> because each region's output will become a htable region. Too many regions
> will be a burden to hbase cluster. In our production env we have cubes that
> are 10T+, guess how many regions will it populate?
> 
> What's more Kylin provides different profiles to control the expected
> region size (thus controlling the number of regions & parallelism of
> "create htable" reducer), you can modify it depending on your cube size. In
> 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
> However this is a manual work when creating cube, and I admit the value
> settings for the three profiles is still discussable.
> 
> 
> 
> 
> On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun  wrote:
> 
>> Agreed with 梁猛.
>> 
>> Actually we found the same issue, the number of reducers is too small in
>> step ‘convert to hfile’, which is same as the region count.
>> 
>> I think we could increase the number of reducers, to improve performance.
>> If anyone has interesting in this, we could discuss more about the solution.
>> 
>>> 在 2016年1月15日,09:46,13802880...@139.com 写道:
>>> 
>>> actually,I found the last step " convert to hfile"  take too much time,
>> more than 40 minutes for single region(use small, and result file about 5GB)
>>> 
>>> 
>>> 
>>> 中国移动广东有限公司 网管中心 梁猛
>>> 13802880...@139.com
>>> 
>>> From: ShaoFeng Shi
>>> Date: 2016-01-15 09:40
>>> To: dev
>>> Subject: Re: beg suggestions to speed up the Kylin cube build
>>> The cube build performance is much determined by your Hadoop cluster's
>>> capacity. You can do some inspection with the MR job's statistics to
>>> analysis the potential bottlenecks.
>>> 
>>> 
>>> 
>>> 2016-01-15 7:19 GMT+08:00 zhong zhang :
>>> 
 Hi All,
 
 We are trying to build a nine-dimension cube:
 eight mandatory dimensions and one hierarchy
 dimension. The fact table is like 20G. Two lookup
 tables are 1.3M and 357k separately. It takes like
 3 hours to go to 30% progress which is kind of slow.
 
 We'd like to know are there suggestions to speed up
 the Kylin cube build. We got a suggestion from
 a slide said that sort the dimension based on the
 cardinality. Are there any other ways we can try?
 
 We also noticed that only half of the memory and
 half of the CPU are used during the cube build.
 Are there any ways to fully utilize the resource?
 
 Looking forward to hear from you.
 
 Best regards,
 Zhong
 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>> Shaofeng Shi
>> 
>> 
> 
> 
> -- 
> Regards,
> 
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone



[jira] [Created] (KYLIN-1319) Find a better way to check hadoop job status

2016-01-14 Thread liyang (JIRA)
liyang created KYLIN-1319:
-

 Summary: Find a better way to check hadoop job status
 Key: KYLIN-1319
 URL: https://issues.apache.org/jira/browse/KYLIN-1319
 Project: Kylin
  Issue Type: Improvement
Reporter: liyang


Currently Kylin retrieves jobs status via a resource manager web service like 
"https://:/ws/v1/cluster/apps/${job_id}?anonymous=true".

It is not most robust. Some user does not have 
"yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status 
will fail out-of-box. They have to set a Kylin property 
"kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user 
friendly.

Kerberos authentication might cause problem too if security is enabled.

Is there a more robust way to check job status? Via Job API?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread hongbin ma
hi, yerui,

the reason why the number of "convert to hfile" reducers is small is
because each region's output will become a htable region. Too many regions
will be a burden to hbase cluster. In our production env we have cubes that
are 10T+, guess how many regions will it populate?

What's more Kylin provides different profiles to control the expected
region size (thus controlling the number of regions & parallelism of
"create htable" reducer), you can modify it depending on your cube size. In
2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
However this is a manual work when creating cube, and I admit the value
settings for the three profiles is still discussable.




On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun  wrote:

> Agreed with 梁猛.
>
> Actually we found the same issue, the number of reducers is too small in
> step ‘convert to hfile’, which is same as the region count.
>
> I think we could increase the number of reducers, to improve performance.
> If anyone has interesting in this, we could discuss more about the solution.
>
> > 在 2016年1月15日,09:46,13802880...@139.com 写道:
> >
> > actually,I found the last step " convert to hfile"  take too much time,
> more than 40 minutes for single region(use small, and result file about 5GB)
> >
> >
> >
> > 中国移动广东有限公司 网管中心 梁猛
> > 13802880...@139.com
> >
> > From: ShaoFeng Shi
> > Date: 2016-01-15 09:40
> > To: dev
> > Subject: Re: beg suggestions to speed up the Kylin cube build
> > The cube build performance is much determined by your Hadoop cluster's
> > capacity. You can do some inspection with the MR job's statistics to
> > analysis the potential bottlenecks.
> >
> >
> >
> > 2016-01-15 7:19 GMT+08:00 zhong zhang :
> >
> >> Hi All,
> >>
> >> We are trying to build a nine-dimension cube:
> >> eight mandatory dimensions and one hierarchy
> >> dimension. The fact table is like 20G. Two lookup
> >> tables are 1.3M and 357k separately. It takes like
> >> 3 hours to go to 30% progress which is kind of slow.
> >>
> >> We'd like to know are there suggestions to speed up
> >> the Kylin cube build. We got a suggestion from
> >> a slide said that sort the dimension based on the
> >> cardinality. Are there any other ways we can try?
> >>
> >> We also noticed that only half of the memory and
> >> half of the CPU are used during the cube build.
> >> Are there any ways to fully utilize the resource?
> >>
> >> Looking forward to hear from you.
> >>
> >> Best regards,
> >> Zhong
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone


Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread hongbin ma
I'm not sure if it will work, does hbase bulk load allow that?​

On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun  wrote:

> hongbin,
>
> I understand how the number of reducers is determined, and it could be
> improved.
>
> Supposed that we got 100GB data after cuboid building, and with setting
> that 10GB per region. For now, 10 split keys was calculated, and 10 region
> created, 10 reducer used in ‘convert to hfile’ step.
>
> With optimization, we could calculate 100 (or more) split keys, and use
> all them in ‘covert to file’ step, but sampled 10 keys in them to create
> regions. The result is still 10 region created, but 100 reducer used in
> ‘convert to file’ step. Of course, the hfile created is also 100, and load
> 10 files per region. That’s should be fine, doesn’t affect the query
> performance dramatically.
>
> > 在 2016年1月15日,13:53,hongbin ma  写道:
> >
> > hi, yerui,
> >
> > the reason why the number of "convert to hfile" reducers is small is
> > because each region's output will become a htable region. Too many
> regions
> > will be a burden to hbase cluster. In our production env we have cubes
> that
> > are 10T+, guess how many regions will it populate?
> >
> > What's more Kylin provides different profiles to control the expected
> > region size (thus controlling the number of regions & parallelism of
> > "create htable" reducer), you can modify it depending on your cube size.
> In
> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
> > However this is a manual work when creating cube, and I admit the value
> > settings for the three profiles is still discussable.
> >
> >
> >
> >
> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun  wrote:
> >
> >> Agreed with 梁猛.
> >>
> >> Actually we found the same issue, the number of reducers is too small in
> >> step ‘convert to hfile’, which is same as the region count.
> >>
> >> I think we could increase the number of reducers, to improve
> performance.
> >> If anyone has interesting in this, we could discuss more about the
> solution.
> >>
> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道:
> >>>
> >>> actually,I found the last step " convert to hfile"  take too much time,
> >> more than 40 minutes for single region(use small, and result file about
> 5GB)
> >>>
> >>>
> >>>
> >>> 中国移动广东有限公司 网管中心 梁猛
> >>> 13802880...@139.com
> >>>
> >>> From: ShaoFeng Shi
> >>> Date: 2016-01-15 09:40
> >>> To: dev
> >>> Subject: Re: beg suggestions to speed up the Kylin cube build
> >>> The cube build performance is much determined by your Hadoop cluster's
> >>> capacity. You can do some inspection with the MR job's statistics to
> >>> analysis the potential bottlenecks.
> >>>
> >>>
> >>>
> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang :
> >>>
>  Hi All,
> 
>  We are trying to build a nine-dimension cube:
>  eight mandatory dimensions and one hierarchy
>  dimension. The fact table is like 20G. Two lookup
>  tables are 1.3M and 357k separately. It takes like
>  3 hours to go to 30% progress which is kind of slow.
> 
>  We'd like to know are there suggestions to speed up
>  the Kylin cube build. We got a suggestion from
>  a slide said that sort the dimension based on the
>  cardinality. Are there any other ways we can try?
> 
>  We also noticed that only half of the memory and
>  half of the CPU are used during the cube build.
>  Are there any ways to fully utilize the resource?
> 
>  Looking forward to hear from you.
> 
>  Best regards,
>  Zhong
> 
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Shaofeng Shi
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone


Re: Kylin and Tableau -- Top N query

2016-01-14 Thread ShaoFeng Shi
How much difference between Hive and Kylin? Did you check some factors
like: a) any filtering condition in Cube descriptor? b) is the Cube built
with the full date range of hive table? c) Was the fact/lookup table data
changed since cube be built? Just some hints to exclude those mistakes.
Besides, you can run the SQL from Kylin UI to eliminate the possibility of
ODBC driver.

2016-01-15 7:50 GMT+08:00 sdangi :

> Results from Kylin and Tableau on a live connection don't match.  Any
> reason?
> I'm creating a custom data source (Custom SQL Query) in Tableau and adding
> a
> parameter control using a query similar to below:
>
> SELECT
> t2.c1
> ,sum(t1.c2) AS c3
> FROM t1
> Inner join t2
> on t1.k1 = t2.k1
> group by t2.c1
> order by c3
> LIMIT 
>
> t1 (fact) has 130MM rows and t2 (dimension) has 1.7MM
>
> The query shows different Top N records in Tableau as compared to Kylin and
> Hive.
>
> Thanks,
> Regards,
>
> --
> View this message in context:
> http://apache-kylin.74782.x6.nabble.com/Kylin-and-Tableau-Top-N-query-tp3250.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



-- 
Best regards,

Shaofeng Shi


Re: beg suggestions to speed up the Kylin cube build

2016-01-14 Thread ShaoFeng Shi
For Meng's case, write 5GB takes 40 minutes, that's really slow. The
bottleneck should be on HDFS write (cuboid has been calculated, just
convert to HFile format in that step, no calculation and others).

2016-01-15 15:36 GMT+08:00 hongbin ma :

> if it works I'd love to see the change
>
> On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma  wrote:
>
> > I'm not sure if it will work, does hbase bulk load allow that?​
> >
> > On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun  wrote:
> >
> >> hongbin,
> >>
> >> I understand how the number of reducers is determined, and it could be
> >> improved.
> >>
> >> Supposed that we got 100GB data after cuboid building, and with setting
> >> that 10GB per region. For now, 10 split keys was calculated, and 10
> region
> >> created, 10 reducer used in ‘convert to hfile’ step.
> >>
> >> With optimization, we could calculate 100 (or more) split keys, and use
> >> all them in ‘covert to file’ step, but sampled 10 keys in them to create
> >> regions. The result is still 10 region created, but 100 reducer used in
> >> ‘convert to file’ step. Of course, the hfile created is also 100, and
> load
> >> 10 files per region. That’s should be fine, doesn’t affect the query
> >> performance dramatically.
> >>
> >> > 在 2016年1月15日,13:53,hongbin ma  写道:
> >> >
> >> > hi, yerui,
> >> >
> >> > the reason why the number of "convert to hfile" reducers is small is
> >> > because each region's output will become a htable region. Too many
> >> regions
> >> > will be a burden to hbase cluster. In our production env we have cubes
> >> that
> >> > are 10T+, guess how many regions will it populate?
> >> >
> >> > What's more Kylin provides different profiles to control the expected
> >> > region size (thus controlling the number of regions & parallelism of
> >> > "create htable" reducer), you can modify it depending on your cube
> >> size. In
> >> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
> >> > However this is a manual work when creating cube, and I admit the
> value
> >> > settings for the three profiles is still discussable.
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun 
> wrote:
> >> >
> >> >> Agreed with 梁猛.
> >> >>
> >> >> Actually we found the same issue, the number of reducers is too small
> >> in
> >> >> step ‘convert to hfile’, which is same as the region count.
> >> >>
> >> >> I think we could increase the number of reducers, to improve
> >> performance.
> >> >> If anyone has interesting in this, we could discuss more about the
> >> solution.
> >> >>
> >> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道:
> >> >>>
> >> >>> actually,I found the last step " convert to hfile"  take too much
> >> time,
> >> >> more than 40 minutes for single region(use small, and result file
> >> about 5GB)
> >> >>>
> >> >>>
> >> >>>
> >> >>> 中国移动广东有限公司 网管中心 梁猛
> >> >>> 13802880...@139.com
> >> >>>
> >> >>> From: ShaoFeng Shi
> >> >>> Date: 2016-01-15 09:40
> >> >>> To: dev
> >> >>> Subject: Re: beg suggestions to speed up the Kylin cube build
> >> >>> The cube build performance is much determined by your Hadoop
> cluster's
> >> >>> capacity. You can do some inspection with the MR job's statistics to
> >> >>> analysis the potential bottlenecks.
> >> >>>
> >> >>>
> >> >>>
> >> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang :
> >> >>>
> >>  Hi All,
> >> 
> >>  We are trying to build a nine-dimension cube:
> >>  eight mandatory dimensions and one hierarchy
> >>  dimension. The fact table is like 20G. Two lookup
> >>  tables are 1.3M and 357k separately. It takes like
> >>  3 hours to go to 30% progress which is kind of slow.
> >> 
> >>  We'd like to know are there suggestions to speed up
> >>  the Kylin cube build. We got a suggestion from
> >>  a slide said that sort the dimension based on the
> >>  cardinality. Are there any other ways we can try?
> >> 
> >>  We also noticed that only half of the memory and
> >>  half of the CPU are used during the cube build.
> >>  Are there any ways to fully utilize the resource?
> >> 
> >>  Looking forward to hear from you.
> >> 
> >>  Best regards,
> >>  Zhong
> >> 
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best regards,
> >> >>>
> >> >>> Shaofeng Shi
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > *Bin Mahone | 马洪宾*
> >> > Apache Kylin: http://kylin.io
> >> > Github: https://github.com/binmahone
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Best regards,

Shaofeng Shi