[jira] [Created] (KYLIN-1318) enable gc log for kylin server instance
hongbin ma created KYLIN-1318: - Summary: enable gc log for kylin server instance Key: KYLIN-1318 URL: https://issues.apache.org/jira/browse/KYLIN-1318 Project: Kylin Issue Type: Improvement Reporter: hongbin ma Assignee: hongbin ma -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Using apache reviewboard for reviewing patches
I had a impression that asf git is not well integrated with github, so for a long time we tried not to use github. btw, why do projects like hadoop,hbase not to use github for reviewing? -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
Re: Re: Using apache reviewboard for reviewing patches
No idea why hadoop and hbase community not utilize github. But spark community usually use github to send pr and patches. Maybe more flexible to review and merge. 250635...@qq.com From: hongbin ma Date: 2016-01-14 16:43 To: dev Subject: Re: Using apache reviewboard for reviewing patches I had a impression that asf git is not well integrated with github, so for a long time we tried not to use github. btw, why do projects like hadoop,hbase not to use github for reviewing? -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
Re: Re: Using apache reviewboard for reviewing patches
good point in this case we should think about trying out both review ways, and pick whichever suits us:) On Thu, Jan 14, 2016 at 4:45 PM, 250635...@qq.com <250635...@qq.com> wrote: > No idea why hadoop and hbase community not utilize github. But spark > community usually use github > to send pr and patches. Maybe more flexible to review and merge. > > > > 250635...@qq.com > > From: hongbin ma > Date: 2016-01-14 16:43 > To: dev > Subject: Re: Using apache reviewboard for reviewing patches > I had a impression that asf git is not well integrated with github, so for > a long time we tried not to use github. > > btw, why do projects like hadoop,hbase not to use github for reviewing? > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
Re: beg suggestions to speed up the Kylin cube build
The cube build performance is much determined by your Hadoop cluster's capacity. You can do some inspection with the MR job's statistics to analysis the potential bottlenecks. 2016-01-15 7:19 GMT+08:00 zhong zhang: > Hi All, > > We are trying to build a nine-dimension cube: > eight mandatory dimensions and one hierarchy > dimension. The fact table is like 20G. Two lookup > tables are 1.3M and 357k separately. It takes like > 3 hours to go to 30% progress which is kind of slow. > > We'd like to know are there suggestions to speed up > the Kylin cube build. We got a suggestion from > a slide said that sort the dimension based on the > cardinality. Are there any other ways we can try? > > We also noticed that only half of the memory and > half of the CPU are used during the cube build. > Are there any ways to fully utilize the resource? > > Looking forward to hear from you. > > Best regards, > Zhong > -- Best regards, Shaofeng Shi
Kylin and Tableau -- Top N query
Results from Kylin and Tableau on a live connection don't match. Any reason? I'm creating a custom data source (Custom SQL Query) in Tableau and adding a parameter control using a query similar to below: SELECT t2.c1 ,sum(t1.c2) AS c3 FROM t1 Inner join t2 on t1.k1 = t2.k1 group by t2.c1 order by c3 LIMIT t1 (fact) has 130MM rows and t2 (dimension) has 1.7MM The query shows different Top N records in Tableau as compared to Kylin and Hive. Thanks, Regards, -- View this message in context: http://apache-kylin.74782.x6.nabble.com/Kylin-and-Tableau-Top-N-query-tp3250.html Sent from the Apache Kylin mailing list archive at Nabble.com.
beg suggestions to speed up the Kylin cube build
Hi All, We are trying to build a nine-dimension cube: eight mandatory dimensions and one hierarchy dimension. The fact table is like 20G. Two lookup tables are 1.3M and 357k separately. It takes like 3 hours to go to 30% progress which is kind of slow. We'd like to know are there suggestions to speed up the Kylin cube build. We got a suggestion from a slide said that sort the dimension based on the cardinality. Are there any other ways we can try? We also noticed that only half of the memory and half of the CPU are used during the cube build. Are there any ways to fully utilize the resource? Looking forward to hear from you. Best regards, Zhong
Re: beg suggestions to speed up the Kylin cube build
hongbin, I understand how the number of reducers is determined, and it could be improved. Supposed that we got 100GB data after cuboid building, and with setting that 10GB per region. For now, 10 split keys was calculated, and 10 region created, 10 reducer used in ‘convert to hfile’ step. With optimization, we could calculate 100 (or more) split keys, and use all them in ‘covert to file’ step, but sampled 10 keys in them to create regions. The result is still 10 region created, but 100 reducer used in ‘convert to file’ step. Of course, the hfile created is also 100, and load 10 files per region. That’s should be fine, doesn’t affect the query performance dramatically. > 在 2016年1月15日,13:53,hongbin ma写道: > > hi, yerui, > > the reason why the number of "convert to hfile" reducers is small is > because each region's output will become a htable region. Too many regions > will be a burden to hbase cluster. In our production env we have cubes that > are 10T+, guess how many regions will it populate? > > What's more Kylin provides different profiles to control the expected > region size (thus controlling the number of regions & parallelism of > "create htable" reducer), you can modify it depending on your cube size. In > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. > However this is a manual work when creating cube, and I admit the value > settings for the three profiles is still discussable. > > > > > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun wrote: > >> Agreed with 梁猛. >> >> Actually we found the same issue, the number of reducers is too small in >> step ‘convert to hfile’, which is same as the region count. >> >> I think we could increase the number of reducers, to improve performance. >> If anyone has interesting in this, we could discuss more about the solution. >> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道: >>> >>> actually,I found the last step " convert to hfile" take too much time, >> more than 40 minutes for single region(use small, and result file about 5GB) >>> >>> >>> >>> 中国移动广东有限公司 网管中心 梁猛 >>> 13802880...@139.com >>> >>> From: ShaoFeng Shi >>> Date: 2016-01-15 09:40 >>> To: dev >>> Subject: Re: beg suggestions to speed up the Kylin cube build >>> The cube build performance is much determined by your Hadoop cluster's >>> capacity. You can do some inspection with the MR job's statistics to >>> analysis the potential bottlenecks. >>> >>> >>> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang : >>> Hi All, We are trying to build a nine-dimension cube: eight mandatory dimensions and one hierarchy dimension. The fact table is like 20G. Two lookup tables are 1.3M and 357k separately. It takes like 3 hours to go to 30% progress which is kind of slow. We'd like to know are there suggestions to speed up the Kylin cube build. We got a suggestion from a slide said that sort the dimension based on the cardinality. Are there any other ways we can try? We also noticed that only half of the memory and half of the CPU are used during the cube build. Are there any ways to fully utilize the resource? Looking forward to hear from you. Best regards, Zhong >>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi >> >> > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone
[jira] [Created] (KYLIN-1319) Find a better way to check hadoop job status
liyang created KYLIN-1319: - Summary: Find a better way to check hadoop job status Key: KYLIN-1319 URL: https://issues.apache.org/jira/browse/KYLIN-1319 Project: Kylin Issue Type: Improvement Reporter: liyang Currently Kylin retrieves jobs status via a resource manager web service like "https://:/ws/v1/cluster/apps/${job_id}?anonymous=true". It is not most robust. Some user does not have "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status will fail out-of-box. They have to set a Kylin property "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user friendly. Kerberos authentication might cause problem too if security is enabled. Is there a more robust way to check job status? Via Job API? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: beg suggestions to speed up the Kylin cube build
hi, yerui, the reason why the number of "convert to hfile" reducers is small is because each region's output will become a htable region. Too many regions will be a burden to hbase cluster. In our production env we have cubes that are 10T+, guess how many regions will it populate? What's more Kylin provides different profiles to control the expected region size (thus controlling the number of regions & parallelism of "create htable" reducer), you can modify it depending on your cube size. In 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. However this is a manual work when creating cube, and I admit the value settings for the three profiles is still discussable. On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sunwrote: > Agreed with 梁猛. > > Actually we found the same issue, the number of reducers is too small in > step ‘convert to hfile’, which is same as the region count. > > I think we could increase the number of reducers, to improve performance. > If anyone has interesting in this, we could discuss more about the solution. > > > 在 2016年1月15日,09:46,13802880...@139.com 写道: > > > > actually,I found the last step " convert to hfile" take too much time, > more than 40 minutes for single region(use small, and result file about 5GB) > > > > > > > > 中国移动广东有限公司 网管中心 梁猛 > > 13802880...@139.com > > > > From: ShaoFeng Shi > > Date: 2016-01-15 09:40 > > To: dev > > Subject: Re: beg suggestions to speed up the Kylin cube build > > The cube build performance is much determined by your Hadoop cluster's > > capacity. You can do some inspection with the MR job's statistics to > > analysis the potential bottlenecks. > > > > > > > > 2016-01-15 7:19 GMT+08:00 zhong zhang : > > > >> Hi All, > >> > >> We are trying to build a nine-dimension cube: > >> eight mandatory dimensions and one hierarchy > >> dimension. The fact table is like 20G. Two lookup > >> tables are 1.3M and 357k separately. It takes like > >> 3 hours to go to 30% progress which is kind of slow. > >> > >> We'd like to know are there suggestions to speed up > >> the Kylin cube build. We got a suggestion from > >> a slide said that sort the dimension based on the > >> cardinality. Are there any other ways we can try? > >> > >> We also noticed that only half of the memory and > >> half of the CPU are used during the cube build. > >> Are there any ways to fully utilize the resource? > >> > >> Looking forward to hear from you. > >> > >> Best regards, > >> Zhong > >> > > > > > > > > -- > > Best regards, > > > > Shaofeng Shi > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
Re: beg suggestions to speed up the Kylin cube build
I'm not sure if it will work, does hbase bulk load allow that? On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sunwrote: > hongbin, > > I understand how the number of reducers is determined, and it could be > improved. > > Supposed that we got 100GB data after cuboid building, and with setting > that 10GB per region. For now, 10 split keys was calculated, and 10 region > created, 10 reducer used in ‘convert to hfile’ step. > > With optimization, we could calculate 100 (or more) split keys, and use > all them in ‘covert to file’ step, but sampled 10 keys in them to create > regions. The result is still 10 region created, but 100 reducer used in > ‘convert to file’ step. Of course, the hfile created is also 100, and load > 10 files per region. That’s should be fine, doesn’t affect the query > performance dramatically. > > > 在 2016年1月15日,13:53,hongbin ma 写道: > > > > hi, yerui, > > > > the reason why the number of "convert to hfile" reducers is small is > > because each region's output will become a htable region. Too many > regions > > will be a burden to hbase cluster. In our production env we have cubes > that > > are 10T+, guess how many regions will it populate? > > > > What's more Kylin provides different profiles to control the expected > > region size (thus controlling the number of regions & parallelism of > > "create htable" reducer), you can modify it depending on your cube size. > In > > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. > > However this is a manual work when creating cube, and I admit the value > > settings for the three profiles is still discussable. > > > > > > > > > > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun wrote: > > > >> Agreed with 梁猛. > >> > >> Actually we found the same issue, the number of reducers is too small in > >> step ‘convert to hfile’, which is same as the region count. > >> > >> I think we could increase the number of reducers, to improve > performance. > >> If anyone has interesting in this, we could discuss more about the > solution. > >> > >>> 在 2016年1月15日,09:46,13802880...@139.com 写道: > >>> > >>> actually,I found the last step " convert to hfile" take too much time, > >> more than 40 minutes for single region(use small, and result file about > 5GB) > >>> > >>> > >>> > >>> 中国移动广东有限公司 网管中心 梁猛 > >>> 13802880...@139.com > >>> > >>> From: ShaoFeng Shi > >>> Date: 2016-01-15 09:40 > >>> To: dev > >>> Subject: Re: beg suggestions to speed up the Kylin cube build > >>> The cube build performance is much determined by your Hadoop cluster's > >>> capacity. You can do some inspection with the MR job's statistics to > >>> analysis the potential bottlenecks. > >>> > >>> > >>> > >>> 2016-01-15 7:19 GMT+08:00 zhong zhang : > >>> > Hi All, > > We are trying to build a nine-dimension cube: > eight mandatory dimensions and one hierarchy > dimension. The fact table is like 20G. Two lookup > tables are 1.3M and 357k separately. It takes like > 3 hours to go to 30% progress which is kind of slow. > > We'd like to know are there suggestions to speed up > the Kylin cube build. We got a suggestion from > a slide said that sort the dimension based on the > cardinality. Are there any other ways we can try? > > We also noticed that only half of the memory and > half of the CPU are used during the cube build. > Are there any ways to fully utilize the resource? > > Looking forward to hear from you. > > Best regards, > Zhong > > >>> > >>> > >>> > >>> -- > >>> Best regards, > >>> > >>> Shaofeng Shi > >> > >> > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io > > Github: https://github.com/binmahone > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
Re: Kylin and Tableau -- Top N query
How much difference between Hive and Kylin? Did you check some factors like: a) any filtering condition in Cube descriptor? b) is the Cube built with the full date range of hive table? c) Was the fact/lookup table data changed since cube be built? Just some hints to exclude those mistakes. Besides, you can run the SQL from Kylin UI to eliminate the possibility of ODBC driver. 2016-01-15 7:50 GMT+08:00 sdangi: > Results from Kylin and Tableau on a live connection don't match. Any > reason? > I'm creating a custom data source (Custom SQL Query) in Tableau and adding > a > parameter control using a query similar to below: > > SELECT > t2.c1 > ,sum(t1.c2) AS c3 > FROM t1 > Inner join t2 > on t1.k1 = t2.k1 > group by t2.c1 > order by c3 > LIMIT > > t1 (fact) has 130MM rows and t2 (dimension) has 1.7MM > > The query shows different Top N records in Tableau as compared to Kylin and > Hive. > > Thanks, > Regards, > > -- > View this message in context: > http://apache-kylin.74782.x6.nabble.com/Kylin-and-Tableau-Top-N-query-tp3250.html > Sent from the Apache Kylin mailing list archive at Nabble.com. > -- Best regards, Shaofeng Shi
Re: beg suggestions to speed up the Kylin cube build
For Meng's case, write 5GB takes 40 minutes, that's really slow. The bottleneck should be on HDFS write (cuboid has been calculated, just convert to HFile format in that step, no calculation and others). 2016-01-15 15:36 GMT+08:00 hongbin ma: > if it works I'd love to see the change > > On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma wrote: > > > I'm not sure if it will work, does hbase bulk load allow that? > > > > On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun wrote: > > > >> hongbin, > >> > >> I understand how the number of reducers is determined, and it could be > >> improved. > >> > >> Supposed that we got 100GB data after cuboid building, and with setting > >> that 10GB per region. For now, 10 split keys was calculated, and 10 > region > >> created, 10 reducer used in ‘convert to hfile’ step. > >> > >> With optimization, we could calculate 100 (or more) split keys, and use > >> all them in ‘covert to file’ step, but sampled 10 keys in them to create > >> regions. The result is still 10 region created, but 100 reducer used in > >> ‘convert to file’ step. Of course, the hfile created is also 100, and > load > >> 10 files per region. That’s should be fine, doesn’t affect the query > >> performance dramatically. > >> > >> > 在 2016年1月15日,13:53,hongbin ma 写道: > >> > > >> > hi, yerui, > >> > > >> > the reason why the number of "convert to hfile" reducers is small is > >> > because each region's output will become a htable region. Too many > >> regions > >> > will be a burden to hbase cluster. In our production env we have cubes > >> that > >> > are 10T+, guess how many regions will it populate? > >> > > >> > What's more Kylin provides different profiles to control the expected > >> > region size (thus controlling the number of regions & parallelism of > >> > "create htable" reducer), you can modify it depending on your cube > >> size. In > >> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. > >> > However this is a manual work when creating cube, and I admit the > value > >> > settings for the three profiles is still discussable. > >> > > >> > > >> > > >> > > >> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun > wrote: > >> > > >> >> Agreed with 梁猛. > >> >> > >> >> Actually we found the same issue, the number of reducers is too small > >> in > >> >> step ‘convert to hfile’, which is same as the region count. > >> >> > >> >> I think we could increase the number of reducers, to improve > >> performance. > >> >> If anyone has interesting in this, we could discuss more about the > >> solution. > >> >> > >> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道: > >> >>> > >> >>> actually,I found the last step " convert to hfile" take too much > >> time, > >> >> more than 40 minutes for single region(use small, and result file > >> about 5GB) > >> >>> > >> >>> > >> >>> > >> >>> 中国移动广东有限公司 网管中心 梁猛 > >> >>> 13802880...@139.com > >> >>> > >> >>> From: ShaoFeng Shi > >> >>> Date: 2016-01-15 09:40 > >> >>> To: dev > >> >>> Subject: Re: beg suggestions to speed up the Kylin cube build > >> >>> The cube build performance is much determined by your Hadoop > cluster's > >> >>> capacity. You can do some inspection with the MR job's statistics to > >> >>> analysis the potential bottlenecks. > >> >>> > >> >>> > >> >>> > >> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang : > >> >>> > >> Hi All, > >> > >> We are trying to build a nine-dimension cube: > >> eight mandatory dimensions and one hierarchy > >> dimension. The fact table is like 20G. Two lookup > >> tables are 1.3M and 357k separately. It takes like > >> 3 hours to go to 30% progress which is kind of slow. > >> > >> We'd like to know are there suggestions to speed up > >> the Kylin cube build. We got a suggestion from > >> a slide said that sort the dimension based on the > >> cardinality. Are there any other ways we can try? > >> > >> We also noticed that only half of the memory and > >> half of the CPU are used during the cube build. > >> Are there any ways to fully utilize the resource? > >> > >> Looking forward to hear from you. > >> > >> Best regards, > >> Zhong > >> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Best regards, > >> >>> > >> >>> Shaofeng Shi > >> >> > >> >> > >> > > >> > > >> > -- > >> > Regards, > >> > > >> > *Bin Mahone | 马洪宾* > >> > Apache Kylin: http://kylin.io > >> > Github: https://github.com/binmahone > >> > >> > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io > > Github: https://github.com/binmahone > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone > -- Best regards, Shaofeng Shi