Re: [Discuss] Apache Kylin Component Owner Plan

2018-02-04 Thread yu feng
may I take JDBC components, and backup with @Dong Li , Thanks.

2018-02-05 9:11 GMT+08:00 Jianhua Peng :

> +1.
>
> I would like to take the ownership of "Security" components.
> I also could be a backup owner for "Job Engine" components.
>
> Best regards,
> Jianhua Peng
>
> On 2018/02/02 02:39:38, ShaoFeng Shi  wrote:
> > Hello, Kylin community,
> >
> > In the past, we don't have a clear rule on Kylin each component's
> > ownership, which caused many external patches be pending there as no
> > reviewer to pick up.
> >
> > Now we plan to make the process and responsibility more clear. The main
> > idea is to identify the owners of each Apache Kylin component.
> >
> > - Component owners will be listed in the description field on this Apache
> > Kylin JIRA components page [1]. The owners are listed in the
> 'Description'
> > field rather than in the 'Component Lead' field because the latter only
> > allows us to list one individual whereas it is encouraged that components
> > have multiple owners.
> >
> > - Component owners are volunteers who are expert in their component
> domain
> > and may have an agenda on how they think their Apache Kylin component
> > should evolve. The owner needs to be an Apache Kylin committer at this
> > moment.
> >
> > - Owners will try and review patches that land within their component’s
> > scope.
> >
> > - Owners can rotate, based on his aspiration.
> >
> > - When nominate or vote a new committer, the nominator needs to state
> which
> > component the candidate can be the owner.
> >
> > - If you're already an Apache Kylin committer and would like to be a
> > volunteer as a component owner, just write to the dev list and we’ll sign
> > you up.
> >
> > - If you think the component list need be updated (add, remove, rename,
> > etc), write to the dev list and we’ll review that.
> >
> > Below is the component list with old component lead, which assumes to be
> > updated soon.
> >
> > [1]
> > https://issues.apache.org/jira/projects/KYLIN?
> selectedItem=com.atlassian.jira.jira-projects-plugin:components-page
> >
> > Please comment on this plan; if no objection, we will run it for some
> time
> > to see the effect. Thanks for your inputs!
> >
> > And, thanks to Apache HBase community, from where I learned this.
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>


Re: kylin job block hbase

2017-10-15 Thread yu feng
Yes, we have configure it, the problem is BulkLoad job(MR job) will write
too many data to Hbase‘s HDFS, which will affect to Hbase's normal use, If
there are some ways to limit the bandwidth of the BulkLoad job will be
great. Do you have some good idea?

2017-10-16 11:32 GMT+08:00 ShaoFeng Shi :

> Did you configure "kylin.hbase.cluster.fs", pointing to your HBase HDFS?
>
> Check this blog for more:
> https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>
> 2017-10-16 9:51 GMT+08:00 yu feng :
>
> > yes, hbase is running on another HDFS, and in a very big BulkLoad, the
> HDFS
> > is blocking (network or disk I/O), which block Hbase.
> >
> > 2017-10-15 9:38 GMT+08:00 ShaoFeng Shi :
> >
> > > The generation of HFile is happened in the "Convert to HFile" step,
> which
> > > is an MR job, won't block HBase normal tasks.
> > >
> > > The HBase BulkLoad on HDFS should be very fast (second level), as it is
> > > just a move operation.
> > >
> > > For your case, is your HBase running with another HDFS other than the
> > > default HDFS?
> > >
> > >
> > > 2017-10-13 16:16 GMT+08:00 yu feng :
> > >
> > > > A very big cube, such as cube size id bigger than 1TB will block
> > hbase's
> > > > normal operation when doing the BulkLoad job (because the job will
> > write
> > > to
> > > > much data to HDFS), such as kylin metadata operation/ query.
> especially
> > > > when the cube's merge job maybe write to hbase N TB in a mr job.
> > > >
> > > > Has anyone met the problem?
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


Re: kylin job block hbase

2017-10-15 Thread yu feng
yes, hbase is running on another HDFS, and in a very big BulkLoad, the HDFS
is blocking (network or disk I/O), which block Hbase.

2017-10-15 9:38 GMT+08:00 ShaoFeng Shi :

> The generation of HFile is happened in the "Convert to HFile" step, which
> is an MR job, won't block HBase normal tasks.
>
> The HBase BulkLoad on HDFS should be very fast (second level), as it is
> just a move operation.
>
> For your case, is your HBase running with another HDFS other than the
> default HDFS?
>
>
> 2017-10-13 16:16 GMT+08:00 yu feng :
>
> > A very big cube, such as cube size id bigger than 1TB will block hbase's
> > normal operation when doing the BulkLoad job (because the job will write
> to
> > much data to HDFS), such as kylin metadata operation/ query. especially
> > when the cube's merge job maybe write to hbase N TB in a mr job.
> >
> > Has anyone met the problem?
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


kylin job block hbase

2017-10-13 Thread yu feng
A very big cube, such as cube size id bigger than 1TB will block hbase's
normal operation when doing the BulkLoad job (because the job will write to
much data to HDFS), such as kylin metadata operation/ query. especially
when the cube's merge job maybe write to hbase N TB in a mr job.

Has anyone met the problem?


Re: different mr config for different project or cube

2017-09-05 Thread yu feng
How to do it in GUI?  by cube level configuration ? show me an example
please.

2017-09-06 10:18 GMT+08:00 Billy Liu :

> You could override the default MR config on the project level or cube level
> through the GUI, not the file.
>
> 2017-09-06 10:11 GMT+08:00 yu feng :
>
> > I remember Kylin support use different mr config file for different
> > project, like KYLIN-1706
> > <https://issues.apache.org/jira/browse/KYLIN-1706> and KYLIN-1706
> > <https://issues.apache.org/jira/browse/KYLIN-1706>  However, I do not
> > know how to use it in kylin-2.0.0.
> >
> > it would be appreciated if anyone can show me how to do it.  Thanks a
> lot.
> >
>


different mr config for different project or cube

2017-09-05 Thread yu feng
I remember Kylin support use different mr config file for different
project, like KYLIN-1706 
 and KYLIN-1706   However,
I do not know how to use it in kylin-2.0.0.

it would be appreciated if anyone can show me how to do it.  Thanks a lot.


Re: About kylin ACL

2017-09-01 Thread yu feng
get it, Thanks a lot

2017-09-01 10:34 GMT+08:00 Joanna He (Jingke He) :

> When you log in to kylin, there is a button on the top left corner “Manage
> project”, click it.
> Then you will be redirected to project setting page.
> Expand the project you want to grant permission of
> Click Access, then you can grant access at project level
>
>
> 何京珂
> Joanna He
>
>
> On 01/09/2017, 9:29 AM, "yu feng"  wrote:
>
> How to grant permission at project level, I just find the way that
> grant to
> cube.
>
>
> 2017-08-31 10:24 GMT+08:00 Li Yang :
>
> > The ADMIN can grant permission to the MODELER at project level. After
>     > creating the new project.
> >
> > On Tue, Aug 22, 2017 at 4:26 PM, yu feng 
> wrote:
> >
> > > In kylin, we have three role: ADMIN/ MODELER and ANALYST
> > >
> > > I find only ADMIN have Permission to create new project, But  as a
> new
> > > normal MODELER user, What can I do in kylin?
> > >
> > > If ADMIN create a project for me, How to grant it to a MODELER
> user?
> > >
> >
>
>
>


Re: About kylin ACL

2017-08-31 Thread yu feng
How to grant permission at project level, I just find the way that grant to
cube.


2017-08-31 10:24 GMT+08:00 Li Yang :

> The ADMIN can grant permission to the MODELER at project level. After
> creating the new project.
>
> On Tue, Aug 22, 2017 at 4:26 PM, yu feng  wrote:
>
> > In kylin, we have three role: ADMIN/ MODELER and ANALYST
> >
> > I find only ADMIN have Permission to create new project, But  as a new
> > normal MODELER user, What can I do in kylin?
> >
> > If ADMIN create a project for me, How to grant it to a MODELER  user?
> >
>


About kylin ACL

2017-08-22 Thread yu feng
In kylin, we have three role: ADMIN/ MODELER and ANALYST

I find only ADMIN have Permission to create new project, But  as a new
normal MODELER user, What can I do in kylin?

If ADMIN create a project for me, How to grant it to a MODELER  user?


Kylin query return empty after upgrade to 2.0

2017-08-21 Thread yu feng
Hi, After I upgrade our env. from 1.5.2.1 to 2.0.0, building things go
well, However, every query to older(generated before upgrading) return
empty(0 result), and newly build segment return result success.

I add some debug log and find in GTFilterScanner, I get 0 input
from inputIterator and return 0 to GTAggregateScanner, Here is my kylin log
:

2017-08-22 11:46:07,308 INFO  [kylin-coproc--pool157-t3958]
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable NEW_KYLIN_JDVOZ5IBWC Shard
\x4E\x45\x57\x5F\x4B\x59\x4C\x49\x4E\x5F\x4A\x44\x56\x4F\x5A\x35\x49\x42\x57\x43\x2C\x00\x01\x2C\
x31\x35\x30\x33\x31\x38\x31\x39\x34\x38\x33\x33\x33\x2E\x62\x37\x35\x31\x38\x32\x35\x32\x32\x63\x36\x64\x61\x63\x36\x39\x32\x33\x33\x63\x33\x32\x30\x38\x36\x34\x62\x
39\x38\x31\x37\x62\x2E on host: hz-hbase2.photo.163.org.Total scanned row:
0. Total scanned bytes: 0. Total filtered/aggred row: 0. Time elapsed in
EP: 2(ms). Server
 CPU usage: 0.13636363636363635, server physical mem left: 4.35449856E8,
server swap mem left:8.589930496E9.Etc message: start latency: 6@0,agg
done@1,compress done@
1,server stats done@2,
debugGitTag:997bac07cd74f9a3ac9d50714e8740ddc2e5c1c7;.Normal Complete:
true.Compressed row size: 8

2017-08-22 11:46:07,561 INFO  [kylin-coproc--pool157-t3961]
v2.CubeHBaseEndpointRPC:200 : Endpoint RPC returned from HTable KYLIN_0H7MAKPJQL Shard
\x4B\x59\x4C\x49\x4E\x5F\x30\x48\x37\x4D\x41\x4B\x50\x4A\x51\x4C\x2C\x2C\x31\x35\x30\x33\x33\x36\x38\
x31\x36\x34\x37\x32\x32\x2E\x30\x63\x37\x39\x64\x34\x63\x36\x62\x63\x61\x62\x64\x39\x36\x64\x61\x61\x34\x31\x34\x32\x64\x64\x35\x64\x37\x64\x31\x34\x65\x34\x2E
on ho
st: hz-hmaster0.xs.163.org.Total scanned row: 87935. Total scanned bytes:
4752238. Total filtered/aggred row: 87934. Time elapsed in EP: 254(ms).
Server CPU usage: 0
.10531496062992125, server physical mem left: 4.39107584E8, server swap mem
left:2.553663488E9.Etc message: start latency: 2@1,agg done@253,compress
done@253,server
stats done@254,
debugGitTag:997bac07cd74f9a3ac9d50714e8740ddc2e5c1c7;.Normal Complete:
true.Compressed row size: 13

the first is older segment and later one is newly build segment, there scan
ranges are the same.

Here is my hbase log about the empty result :

2017-08-22 11:04:30,378 DEBUG [RpcServer.reader=9,port=60020]
security.HBaseSaslRpcServer: SASL server GSSAPI callback: setting
canonicalized client ID: nrpt/d...@hadoop.hz.netease.com
2017-08-22 11:04:31,714 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-226] gridtable.GTScanRequest: pre
aggregating results before returning
2017-08-22 11:04:31,714 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-226] gridtable.GTAggregateScanner:
GTAggregateScanner input rows: 0
2017-08-22 11:04:31,714 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-226] endpoint.CubeVisitService: Total
scanned 0 rows and 0 bytes
2017-08-22 11:04:31,714 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-226] endpoint.CubeVisitService: Size
of final result = 8 (0 before compressing)
2017-08-22 11:04:31,726 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-142] gridtable.GTScanRequest: pre
aggregating results before returning
2017-08-22 11:04:31,727 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-142] gridtable.GTAggregateScanner:
GTAggregateScanner input rows: 0
2017-08-22 11:04:31,727 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-142] endpoint.CubeVisitService: Total
scanned 0 rows and 0 bytes
2017-08-22 11:04:31,727 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-142] endpoint.CubeVisitService: Size
of final result = 8 (0 before compressing)
2017-08-22 11:04:31,730 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-221] gridtable.GTScanRequest: pre
aggregating results before returning
2017-08-22 11:04:31,731 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-221] gridtable.GTAggregateScanner:
GTAggregateScanner input rows: 0
2017-08-22 11:04:31,731 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-221] endpoint.CubeVisitService: Total
scanned 0 rows and 0 bytes
2017-08-22 11:04:31,731 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-221] endpoint.CubeVisitService: Size
of final result = 8 (0 before compressing)
2017-08-22 11:04:31,740 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-194] gridtable.GTScanRequest: pre
aggregating results before returning
2017-08-22 11:04:31,740 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-194] gridtable.GTAggregateScanner:
GTAggregateScanner input rows: 0
2017-08-22 11:04:31,740 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-194] endpoint.CubeVisitService: Total
scanned 0 rows and 0 bytes
2017-08-22 11:04:31,740 INFO  [Query
25ed5aec-6a9b-4507-bed4-198812cc25f0-194] endpoint.CubeVisitService: Size
of final result = 8 (0 before compressing)

I guess maybe some rowkey rule is changed?

Did anyone meet the problem? Thanks a lot.


Re: [VOTE] Release apache-kylin-2.1.0 (RC2)

2017-08-16 Thread yu feng
+1(no binding)
build success
md5&sha1 verified
Best regards,

2017-08-15 23:57 GMT+08:00 Luke Han :

> +1 (binding)
>
> mvn test passed
> gpg/md5/sha1 verified
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Tue, Aug 15, 2017 at 9:13 PM, 康凯森  wrote:
>
> > +1.
> >
> >
> > Thanks Shaofeng.
> >
> >
> > -- 原始邮件 --
> > 发件人: "ShaoFeng Shi";;
> > 发送时间: 2017年8月13日(星期天) 下午2:45
> > 收件人: "dev";
> >
> > 主题: [VOTE] Release apache-kylin-2.1.0 (RC2)
> >
> >
> >
> > Hi all,
> >
> > I have created a build for Apache Kylin 2.1.0, release candidate 2.
> >
> > Changes highlights:
> > KYLIN-2506 - Refactor global dictionary
> > KYLIN-2515 - Route unsupported query back to query its source directly
> > KYLIN-2579 KYLIN-2580  - Improvement on subqueries
> > KYLIN-2633 - Upgrade Spark to 2.1
> > KYLIN-2646 - Project level query authorization
> >
> > And more than 100 bug fixes and enhancements.
> >
> > Thanks to everyone who has contributed to this release. Here’s release
> > notes:
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > projectId=12316121&version=12340443
> >
> > The commit to be voted upon:
> >
> > https://github.com/apache/kylin/commit/562dd173aaf6b398be8e053f896755
> > b3afe8137f
> >
> > Its hash is 562dd173aaf6b398be8e053f896755b3afe8137f.
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.1.0-rc2/
> >
> > The hashes of the artifacts are as follows:
> > apache-kylin-2.1.0-src.tar.gz.md5 44cab3240772dd1b2e717b48105b416c
> > apache-kylin-2.1.0-src.tar.gz.sha1 a3470589523cfa9046d70123d78059
> > b913f31b9f
> >
> > (The binary packages for HBase 1.x and CDH 5.7are also provided for
> > testing)
> >
> > A staged Maven repository is available for review at:
> > https://repository.apache.org/content/repositories/orgapachekylin-1043/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/shaofengshi.asc
> >
> > Please vote on releasing this package as Apache Kylin 2.1.0.
> >
> > The vote is open for the next 72 hours and passes if a majority of
> > at least three +1 PPMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Kylin 2.1.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>


Re: [VOTE] Release apache-kylin-2.0.0 (RC3)

2017-04-28 Thread yu feng
+1(no binding)
mvn test passed
build(mvn clean install -DskipTests) success
md5&sha1 verified

2017-04-29 0:30 GMT+08:00 Dong Li :

> +1 binding
>
> mvn test passed
>
> Thanks,
> Dong Li
>
> 2017-04-28 22:12 GMT+08:00 ShaoFeng Shi :
>
> > +1 binding;
> >
> > Verified mvn test, md5, sha1 and gpg signature, all good.
> >
> > 2017-04-28 15:11 GMT+08:00 hongbin ma :
> >
> > > +1
> > >
> > > mvn test passed
> > > mvn integration test passed
> > >
> > > On Thu, Apr 27, 2017 at 8:38 AM, 《秦殇》!健  wrote:
> > >
> > > > +1
> > > > It's great!!!
> > > > -- 原始邮件 --
> > > > 发件人: "Li Yang";;
> > > > 发送时间: 2017年4月27日(星期四) 上午7:22
> > > > 收件人: "dev";
> > > >
> > > > 主题: [VOTE] Release apache-kylin-2.0.0 (RC3)
> > > >
> > > >
> > > >
> > > > Hi all,
> > > >
> > > > I have created a build for Apache Kylin 2.0.0, release candidate 3.
> > > >
> > > > Changes highlights:
> > > >
> > > > Support snowflake data model (KYLIN-1875)
> > > > Support TPC-H queries (KYLIN-2467)
> > > > Spark cubing engine (KYLIN-2331)
> > > > Job engine HA (KYLIN-2006)
> > > > Percentile measure (KYLIN-2396)
> > > > Cloud tested (KYLIN-2351)
> > > >
> > > >
> > > > Thanks to everyone who has contributed to this release. Here is
> release
> > > > notes:
> > > > http://kylin.apache.org/docs20/release_notes.html
> > > >
> > > > The commit to be voted upon (375fd807c281d8c5deff0620747c80
> > 6be2019782):
> > > > https://github.com/apache/kylin/tree/kylin-2.0.0
> > > >
> > > > The artifacts to be voted on are located here:
> > > > https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.0.0-rc3/
> > > >
> > > > A staged Maven repository is available for review at:
> > > > https://repository.apache.org/content/repositories/
> > orgapachekylin-1041/
> > > >
> > > > Release artifacts are signed with the following key:
> > > > https://people.apache.org/keys/committer/liyang.asc
> > > >
> > > > Please vote on releasing this package as Apache Kylin 2.0.0.
> > > >
> > > > The vote is open for the next 72 hours and passes if a majority of
> > > > at least three +1 PPMC votes are cast.
> > > >
> > > > [ ] +1 Release this package as Apache Kylin 2.0.0
> > > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > > [ ] -1 Do not release this package because...
> > > >
> > > >
> > > > Here is my vote:
> > > >
> > > > +1 (binding)
> > > >
> > > >
> > > > Cheers
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>


Re: [VOTE] Release apache-kylin-2.0.0 (RC2)

2017-04-25 Thread yu feng
+1(no binding)
mvn test passed
build(mvn clean install -DskipTests) success
md5&sha1 verified

2017-04-26 6:32 GMT+08:00 Li Yang :

> Hi all,
>
> I have created a build for Apache Kylin 2.0.0, release candidate 2.
>
> Changes highlights:
>
> Support snowflake data model (KYLIN-1875)
> Support TPC-H queries (KYLIN-2467)
> Spark cubing engine (KYLIN-2331)
> Job engine HA (KYLIN-2006)
> Percentile measure (KYLIN-2396)
> Cloud tested (KYLIN-2351)
>
>
> Thanks to everyone who has contributed to this release. Here is release
> notes:
> http://kylin.apache.org/docs20/release_notes.html
>
> The commit to be voted upon (1412e954f912201dfb298f086b1eb88dc57a386d):
> https://github.com/apache/kylin/tree/kylin-2.0.0
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.0.0-rc2/
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1040/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/liyang.asc
>
> Please vote on releasing this package as Apache Kylin 2.0.0.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PPMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.0.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
>
> Cheers
>


Re: Unable to connect to Kylin Web UI

2016-10-30 Thread yu feng
we always use kylin index page : http://hostname:7070/kylin/login

2016-10-30 16:36 GMT+08:00 BigdataGR :

> Hi,
>
> Can anyone please help me to move forward ? I am stuck at accessing kylin
> web UI even kylin is started successfully as shown in the logs enclosed.
>
> Appreciate your help!!!
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Unable-to-connect-to-Kylin-Web-UI-tp6036p6126.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>


Re: [VOTE] Release apache-kylin-1.5.4.1 (release candidate 2)

2016-09-26 Thread yu feng
+1(binding)
mvn test passed
md5 and sha1 verify passed.

2016-09-26 21:35 GMT+08:00 Xiaoyu Wang :

> +1 (binding)
> mvn test passed
> signature&md5&sha1 verified
>
>
>
> 在 2016年09月26日 20:16, Li Yang 写道:
>
>> +1 (binding)
>>
>> checked release commit
>>
>> mvn test passed on
>> java version "1.7.0_71"
>> OpenJDK Runtime Environment (rhel-2.5.3.1.el6-x86_64 u71-b14)
>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>
>>
>> Yang
>>
>> On Mon, Sep 26, 2016 at 7:34 PM, Luke Han  wrote:
>>
>> +1 (binding)
>>>
>>>
>>> mvn test passed
>>>
>>>
>>> Best Regards!
>>> -
>>>
>>> Luke Han
>>>
>>> On Sun, Sep 25, 2016 at 10:06 PM, Dong Li  wrote:
>>>
>>> +1 (binding)


 mvn test passed
 gpg siginature veryfi passed
 md5  sha verify passed


 Thanks,
 Dong Li


 Original Message
 Sender:ShaoFeng shishaofeng...@apache.org
 Recipient:dev...@kylin.apache.org
 Date:Saturday, Sep 24, 2016 15:34
 Subject:[VOTE] Release apache-kylin-1.5.4.1 (release candidate 2)


 Hi all, I have created a build for Apache Kylin 1.5.4.1, release

>>> candidate
>>>
 2. Changes: [KYLIN-2026] - NPE occurs when build a cube without
 partition
 column [KYLIN-2032] - Cube build failed when partition column isn't in
 dimension list Thanks to everyone who has contributed to this release.
 Here’s release notes: https://issues.apache.org/
 jira/secure/ReleaseNote.jspa?projectId=12316121version=12338305 The
 commit to be voted upon: https://github.com/apache/kylin/commit/
 895a91b1eff68e4bb9b7183664e71b400c6a3d3c Its hash is
 895a91b1eff68e4bb9b7183664e71b400c6a3d3c. The artifacts to be voted on
 are located here: https://dist.apache.org/repos/
 dist/dev/kylin/apache-kylin-1.5.4.1-rc2/ The hashes of the artifacts
 are
 as follows: apache-kylin-1.5.4.1-src.tar.gz.md5
 4b0768bc17e85d8598bcbe2c226d7adb apache-kylin-1.5.4.1-src.tar.gz.sha1
 400f854efdff5cf0f48c6996aa1ccdacc606a884 A staged Maven repository is
 available for review at: https://repository.apache.org/
 content/repositories/orgapachekylin-1036/ Release artifacts are signed
 with the following key: https://people.apache.org/
 keys/committer/shaofengshi.asc Please vote on releasing this package as
 Apache Kylin 1.5.4.1. The vote is open for the next 72 hours and passes

>>> if
>>>
 a majority of at least three +1 PPMC votes are cast. [ ] +1 Release this
 package as Apache Kylin 1.5.4.1 [ ] 0 I don't feel strongly about it,
 but
 I'm okay with the release [ ] -1 Do not release this package because...
 Here is my vote: +1 (binding) -- Best regards, Shaofeng Shi 史少锋

>>>
> --
> 王晓雨
> 云平台事业部--数据云部
> 
> -
> 手机:18600049984
> 邮箱:wangxiao...@jd.com
> 邮编:100101
> 地址:北京市朝阳区北辰西路8号北辰世纪中心A座5层
> 
> -
>
>


normal user can not load table

2016-06-29 Thread yu feng
when load table in a project, the user must has ADMIN role
to syncTableToProject, is it some error about this logical


Re: id-name

2016-06-14 Thread yu feng
derived dimension is a better choice.

2016-06-14 23:00 GMT+08:00 耳东 <775620...@qq.com>:

> hi allwhen I build the cube, there are always dimension like
> (city_id,city_name) or (terminal_id,terminal_name) and so on, which is one
> to one relation. I don't want to let both id and name be dimension. How can
> I make name a measure other than a dimension?


TOPN group by column can not defined in dimension

2016-06-03 Thread yu feng
I create a cube with kylin sample data, cube defination is below :

{
  "uuid": "e6cf2ccc-edca-41c6-b637-b3bc50894b5e",
  "version": "1.5.1",
  "name": "kylin_sales_cube_desc_2_clone",
  "description": null,
  "dimensions": [
{
  "name": "CAL_DT",
  "table": "DEFAULT.KYLIN_CAL_DT",
  "column": "{FK}",
  "derived": [
"WEEK_BEG_DT"
  ]
},
{
  "name": "CATEGORY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "{FK}",
  "derived": [
"USER_DEFINED_FIELD1",
"USER_DEFINED_FIELD3",
"UPD_DATE",
"UPD_USER"
  ]
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "META_CATEG_NAME",
  "derived": null
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "CATEG_LVL2_NAME",
  "derived": null
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "CATEG_LVL3_NAME",
  "derived": null
},
{
  "name": "LSTG_FORMAT_NAME",
  "table": "DEFAULT.KYLIN_SALES",
  "column": "LSTG_FORMAT_NAME",
  "derived": null
}
  ],
  "measures": [
{
  "name": "TRANS_CNT",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "constant",
  "value": "1",
  "next_parameter": null
},
"returntype": "bigint"
  },
  "dependent_measure_ref": null
},
{
  "name": "SELLER_CNT_HLL",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "SELLER_ID",
  "next_parameter": null
},
"returntype": "hllc16"
  },
  "dependent_measure_ref": null
},
{
  "name": "SELLER_FORMAT_CNT",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "LSTG_FORMAT_NAME",
  "next_parameter": null
},
"returntype": "hllc12"
  },
  "dependent_measure_ref": null
},
{
  "name": "ITEM_COUNT_DISTINCT_COUNT",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "ITEM_COUNT",
  "next_parameter": null
},
"returntype": "bitmap"
  },
  "dependent_measure_ref": null
},
{
  "name": "TOP",
  "function": {
"expression": "TOP_N",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": {
"type": "column",
"value": "META_CATEG_NAME",
"next_parameter": null
  }
},
"returntype": "topn(100)"
  },
  "dependent_measure_ref": null
},
{
  "name": "SOURCE",
  "function": {
"expression": "RAW",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": null
},
"returntype": "raw"
  },
  "dependent_measure_ref": null
},
{
  "name": "TOPP",
  "function": {
"expression": "TOP_N",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": {
"type": "column",
"value": "ITEM_COUNT",
"next_parameter": null
  }
},
"returntype": "topn(100)"
  },
  "dependent_measure_ref": null
}
  ],
  "rowkey": {
"rowkey_columns": [
  {
"column": "PART_DT",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "LEAF_CATEG_ID",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "META_CATEG_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "CATEG_LVL2_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "CATEG_LVL3_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "LSTG_FORMAT_NAME",
"encoding": "fixed_length:16",
"isShardBy": false
  },
  {
"column": "LSTG_SITE_ID",
"encoding": "dict",
"isShardBy": false
  }
]
  },
  "signature": "RU4IejPOo8asXrxnelDHSw==",
  "last_modified": 1464771861329,
  "model_name": "kylin_sales_model",
  "null_string": null,
  "hbase_mapping": {
"column_family": [
  {
"name": "F1",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "TRANS_CNT",
  "TOP",
  "SOURCE",
  "TOPP"
]
  }
]
  },
  {
"name": "F2",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "SELLER_CNT_HLL",
  "SELLER_F

TOPN group by column can not defined in dimension

2016-06-01 Thread yu feng
I create a cube with kylin sample data, cube defination is below :

{
  "uuid": "e6cf2ccc-edca-41c6-b637-b3bc50894b5e",
  "version": "1.5.1",
  "name": "kylin_sales_cube_desc_2_clone",
  "description": null,
  "dimensions": [
{
  "name": "CAL_DT",
  "table": "DEFAULT.KYLIN_CAL_DT",
  "column": "{FK}",
  "derived": [
"WEEK_BEG_DT"
  ]
},
{
  "name": "CATEGORY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "{FK}",
  "derived": [
"USER_DEFINED_FIELD1",
"USER_DEFINED_FIELD3",
"UPD_DATE",
"UPD_USER"
  ]
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "META_CATEG_NAME",
  "derived": null
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "CATEG_LVL2_NAME",
  "derived": null
},
{
  "name": "CATEGORY_HIERARCHY",
  "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
  "column": "CATEG_LVL3_NAME",
  "derived": null
},
{
  "name": "LSTG_FORMAT_NAME",
  "table": "DEFAULT.KYLIN_SALES",
  "column": "LSTG_FORMAT_NAME",
  "derived": null
}
  ],
  "measures": [
{
  "name": "TRANS_CNT",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "constant",
  "value": "1",
  "next_parameter": null
},
"returntype": "bigint"
  },
  "dependent_measure_ref": null
},
{
  "name": "SELLER_CNT_HLL",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "SELLER_ID",
  "next_parameter": null
},
"returntype": "hllc16"
  },
  "dependent_measure_ref": null
},
{
  "name": "SELLER_FORMAT_CNT",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "LSTG_FORMAT_NAME",
  "next_parameter": null
},
"returntype": "hllc12"
  },
  "dependent_measure_ref": null
},
{
  "name": "ITEM_COUNT_DISTINCT_COUNT",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "ITEM_COUNT",
  "next_parameter": null
},
"returntype": "bitmap"
  },
  "dependent_measure_ref": null
},
{
  "name": "TOP",
  "function": {
"expression": "TOP_N",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": {
"type": "column",
"value": "META_CATEG_NAME",
"next_parameter": null
  }
},
"returntype": "topn(100)"
  },
  "dependent_measure_ref": null
},
{
  "name": "SOURCE",
  "function": {
"expression": "RAW",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": null
},
"returntype": "raw"
  },
  "dependent_measure_ref": null
},
{
  "name": "TOPP",
  "function": {
"expression": "TOP_N",
"parameter": {
  "type": "column",
  "value": "PRICE",
  "next_parameter": {
"type": "column",
"value": "ITEM_COUNT",
"next_parameter": null
  }
},
"returntype": "topn(100)"
  },
  "dependent_measure_ref": null
}
  ],
  "rowkey": {
"rowkey_columns": [
  {
"column": "PART_DT",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "LEAF_CATEG_ID",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "META_CATEG_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "CATEG_LVL2_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "CATEG_LVL3_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "LSTG_FORMAT_NAME",
"encoding": "fixed_length:16",
"isShardBy": false
  },
  {
"column": "LSTG_SITE_ID",
"encoding": "dict",
"isShardBy": false
  }
]
  },
  "signature": "RU4IejPOo8asXrxnelDHSw==",
  "last_modified": 1464771861329,
  "model_name": "kylin_sales_model",
  "null_string": null,
  "hbase_mapping": {
"column_family": [
  {
"name": "F1",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "TRANS_CNT",
  "TOP",
  "SOURCE",
  "TOPP"
]
  }
]
  },
  {
"name": "F2",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "SELLER_CNT_HLL",
  "SELLER_F

Re: How to Build Binary Package in 1.5.x

2016-05-04 Thread yu feng
the script has moved to ${source_code}/build/script/package.sh

2016-05-05 11:12 GMT+08:00 zeLiu :

> 1.5.x has not script/package.sh,How to Build Binary Package? thanks
>
> --
> View this message in context:
> http://apache-kylin.74782.x6.nabble.com/How-to-Build-Binary-Package-in-1-5-x-tp4410.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>


Re: [VOTE] Release apache-kylin-1.5.1 (release candidate 1)

2016-04-10 Thread yu feng
+1 (binding)

signature&md5&sha1 verified

mvn test passed

2016-04-09 17:38 GMT+08:00 Luke Han :

> +1 (binding)
>
> mvn test passed
> signature&md5&sha1 verified
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sat, Apr 9, 2016 at 5:15 PM, 周千昊  wrote:
>
> > +1 (binding)
> > md5 passed
> > sha1 passed
> >
> > ShaoFeng Shi 于2016年4月9日周六 下午4:18写道:
> >
> > > +1 (binding)
> > >
> > > verified signature, md5 and sha hash; mvn test also passed;
> > >
> > > 2016-04-09 16:01 GMT+08:00 王晓雨 :
> > >
> > > > +1 (binding)
> > > >
> > > > mvn test passed
> > > > signature&md5&sha1 verified
> > > >
> > > >
> > > > > 在 2016年4月9日,15:03,Li Yang  写道:
> > > > >
> > > > > +1 binding
> > > > >
> > > > > mvn test pass
> > > > >
> > > > > java version "1.7.0_71"
> > > > > OpenJDK Runtime Environment (rhel-2.5.3.1.el6-x86_64 u71-b14)
> > > > > OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
> > > > >
> > > > >
> > > > > On Fri, Apr 8, 2016 at 2:43 PM, Dong Li  wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >>
> > > > >> I have created a build for Apache Kylin 1.5.1, release candidate
> 1.
> > > > >>
> > > > >>
> > > > >> Changes highlights:
> > > > >> [KYLIN-1122] - Kylin support detail data query from fact
> > > > table[KYLIN-1492]
> > > > >> - Custom dimension encoding[KYLIN-1495] - Metadata upgrade from
> > > 1.0~1.3
> > > > to
> > > > >> 1.5, including metadata correction, relevant tools,
> > etc.[KYLIN-1534] -
> > > > Cube
> > > > >> specific config, override global kylin.properties[KYLIN-1546] -
> Tool
> > > to
> > > > >> dump information for diagnosis
> > > > >>
> > > > >>
> > > > >> Thanks to everyone who has contributed to this release.
> > > > >> Here’s release notes:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121version=12335346
> > > > >>
> > > > >>
> > > > >> The commit to be voted upon:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/kylin/commit/aa98875c1b603e79b866b5e91bc3288e61a0b679
> > > > >>
> > > > >>
> > > > >> Its hash is aa98875c1b603e79b866b5e91bc3288e61a0b679.
> > > > >>
> > > > >>
> > > > >> The artifacts to be voted on are located here:
> > > > >>
> > https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-1.5.1-rc1/
> > > > >>
> > > > >>
> > > > >> The hashes of the artifacts are as follows:
> > > > >> apache-kylin-1.5.1-src.tar.gz.md575df97f689d81f58eff47d1f51cdd45d
> > > > >> apache-kylin-1.5.1-src.tar.gz.sha1
> > > > 8c8266f8fe96665f8520108b75a4491246615ce8
> > > > >>
> > > > >>
> > > > >> A staged Maven repository is available for review at:
> > > > >>
> > > https://repository.apache.org/content/repositories/orgapachekylin-1024
> > > > >>
> > > > >>
> > > > >> Release artifacts are signed with the following key:
> > > > >> https://people.apache.org/keys/committer/lidong.asc
> > > > >>
> > > > >>
> > > > >> Please vote on releasing this package as Apache Kylin 1.5.1.
> > > > >>
> > > > >>
> > > > >> The vote is open for the next 72 hours and passes if a majority of
> > > > >> at least three +1 PPMC votes are cast.
> > > > >>
> > > > >>
> > > > >> [ ] +1 Release this package as Apache Kylin 1.5.1
> > > > >> [ ] 0 I don't feel strongly about it, but I'm okay with the
> release
> > > > >> [ ] -1 Do not release this package because...
> > > > >>
> > > > >>
> > > > >> Here is my vote:
> > > > >>
> > > > >>
> > > > >> +1 (binding)
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Dong Li
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi
> > >
> >
>


Re: [VOTE] Release apache-kylin-1.3.0 (release candidate 2)

2016-03-09 Thread yu feng
+1(no binding)
md5 & sha1 signature is Verified

build success.

mvn test passed.

2016-03-10 11:27 GMT+08:00 Li Yang :

> +1 binding
>
> `mvn test` passed on java version "1.7.0_79", OpenJDK Runtime Environment
> (rhel-2.5.5.1.el6_6-x86_64 u79-b14)
>
> On Thu, Mar 10, 2016 at 11:19 AM, nichunen 
> wrote:
>
> > +1(no binding)
> > build success
> > mvn test passed
> >
> > md5&sha1 verified
> >
> >
> >
> >   George/倪春恩
> >
> > Mobile:+86-13501723787| WeChat:nceecn
> >
> > 北京明略软件系统有限公司(MiningLamp.COM
> > )
> >
> >
> >
> >
> >
> > 上海市浦东新区晨晖路258号G座iDream张江科创中心C125
> >
> >
> >
> >
> > Room C125#,Intelligent Industrial Park Building G,258#Chenhui Road,
> Pudong
> > District,Shanghai,201203
> >
> >
> >
> >
> > > On Mar 10, 2016, at 10:12 AM, Xiaoyu Wang  wrote:
> > >
> > >
> > > +1 (no binding)
> > >
> > >
> > > Verify signature,md5,sha1 is passed.
> > > mvn test is passed.
> > >
> > >
> > > 在 2016年03月09日 16:56, hongbin ma 写道:
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >> I have created a build for Apache Kylin 1.3.0, release candidate 2.
> > >>
> > >>
> > >> Changes highlights:
> > >>
> > >>
> > >> [KYLIN-1323] - Improve performance of converting data to hfile
> > >> [KYLIN-1186] - Support precise Count Distinct using bitmap (under
> > limited
> > >> conditions)
> > >> [KYLIN-976] - Support Custom Aggregation Types
> > >> [KYLIN-1054] - Support Hive client Beeline
> > >> [KYLIN-1128] - Clone Cube Metadata
> > >>
> > >>
> > >> Thanks to everyone who has contributed to this release.
> > >> Here’s the full release notes:
> > >>
> > >>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=1265
> > >>
> > >>
> > >> The commit to be voted upon:
> > >>
> > >>
> > >>
> >
> https://github.com/apache/kylin/commit/b95e47c4dde42ec752916013f67ed1221f092cb7
> > >>
> > >>
> > >> Its hash is b95e47c4dde42ec752916013f67ed1221f092cb7.
> > >>
> > >>
> > >> The artifacts to be voted on are located here:
> > >> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-1.3.0-rc2/
> > >>
> > >>
> > >> The hashes of the artifacts are as follows:
> > >> apache-kylin-1.3.0-src.tar.gz.md5 5fa34cf18eb4a689b469b6926f8b7cda
> > >> apache-kylin-1.3.0-src.tar.gz.sha1
> > 1a355f9c53efe632f77687dab050fed4e94b78c2
> > >>
> > >>
> > >> A staged Maven repository is available for review at:
> > >>
> https://repository.apache.org/content/repositories/orgapachekylin-1022/
> > >>
> > >>
> > >> Release artifacts are signed with the following key:
> > >> https://people.apache.org/keys/committer/mahongbin.asc
> > >>
> > >>
> > >> Please vote on releasing this package as Apache Kylin 1.3.0.
> > >>
> > >>
> > >> The vote is open for the next 72 hours and passes if a majority of
> > >> at least three +1 PPMC votes are cast.
> > >>
> > >>
> > >> [ ] +1 Release this package as Apache Kylin 1.3.0
> > >> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> > >> [ ] -1 Do not release this package because...
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Here is my vote:
> > >>
> > >>
> > >> +1 (binding)
> > >>
> > >
> >
>


Re: cube build failed if the fact_table is not in default database

2016-03-09 Thread yu feng
FAILED: SemanticException 1:567 AS clause has an invalid number of aliases.

I don't think it is error of hive, you should post you hive command here,
and let's talk about it.

2016-03-10 1:07 GMT+08:00 Sarnath :

> Its hard to believe that select * from db.table does not work... Have used
> it many times in hive
>
> Are you in a secure envmt? Like sentry, ranger guarding the databases and
> connected to active directory for authentication?
>
> Best,
> Sarnath
>


Re: building cube error in step 2 (StandbyException)

2016-02-28 Thread yu feng
you can deploy kylin at any node, you need ensure in that node you can run
'hadoop/hive/hbase' command in shell and get corrent outoput. you can try
to move kylin to another node and config you hadoop configuration with HA
mode which makes you can access it with nameservice.

2016-02-26 14:16 GMT+08:00 edison...@idreamsky.com 
:

> Hi all
>
> I deploy kylin 1.2 in cdh hadoop HA cluster . the cluster has 2 namenodes
> one active and one standby.
>
> the scenario is if I setup kylin in a standby namenode machine, it will
> shows the error in building cube step 2:
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1775)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1402)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4221)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:881)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy25.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy26.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
>   at 
> org.apache.kylin.job.hadoop.AbstractHadoopJob.deletePath(AbstractHadoopJob.java:360)
>   at 
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.setupReducer(FactDistinctColumnsJob.java:121)
>   at 
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:78)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.job.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:

Re: when i use curl with Restful Api , HTTP Status 500 - Cannot create a session after the response has been committed

2016-02-28 Thread yu feng
Hi, I have encountered this problem, this is caused by  "Authorization:
Basic QURNSU46S1lMSU4K" is incorrent, you should use "echo -n
'ADMIN:KYLIN'| base64" to generate base64 encoding, which output is
"QURNSU46S1lMSU4=", and you can try it again. wish it can halp you ~

2016-02-26 20:47 GMT+08:00 潘志伟 <31402...@qq.com>:

> Hi All & 热爱大发挥!
>
> I try 'Accept: */*’ , but it still not works .  Is there any other
> mistake in my command or parameters ?
>
>
> --
>
>  curl -c -X POST -H "Authorization: Basic QURNSU46S1lMSU4K" -H
> 'Content-Type: application/json' -H 'Accept: */*'
> http://127.0.0.1:7070/kylin/api/user/authentication
> Apache Tomcat/7.0.59 - Error
> report
> HTTP Status 500 - Cannot create a session after the
> response has been committed noshade="noshade">type Exception reportmessage
> Cannot create a session after the response has been
> committeddescription The server encountered an
> internal error that prevented it from fulfilling this
> request.exception java.lang.IllegalStateException:
> Cannot create a session after the response has been committed
>
> org.apache.catalina.connector.Request.doGetSession(Request.java:3002)
> org.apache.catalina.connector.Request.getSession(Request.java:2378)
>
> org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:897)
>
> javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:229)
>
> org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:80)
>
> org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:69)
>
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
>
> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:205)
>
> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
> note The full stack trace of the root cause is
> available in the Apache Tomcat/7.0.59 logs. noshade="noshade">Apache Tomcat/7.0.59[hdfs@n01
> bin]$
> [hdfs@n01 bin]$
> [hdfs@n01 bin]$
> [hdfs@n01 bin]$
>
> ———
>
>
>
> > On Feb 26, 2016, at 20:19, 热爱大发挥 <385639...@qq.com> wrote:
> >
> > try ti with   -H 'Accept: */*'
> >
> >
> >
> >
> > -- Original --
> > From: "潘志伟"<31402...@qq.com>;
> > Date: 2016年2月26日(星期五) 晚上8:02
> > To: "dev";
> > Subject: when i use curl with  Restful Api , HTTP Status 500 - Cannot
> create a session after the response has been committed
> >
> >
> >
> > Hi,All!
> >
> >   when  I run :
> >
> > [hdfs@n01 bin]$ curl -c /tmp/cookiefile.txt -X POST -H "Authorization:
> Basic QURNSU46S1lMSU4K" -H 'Content-Type: application/json'
> http://127.0.0.1:7070/kylin/api/user/authentication
> >
> > error:
> >
> > Apache Tomcat/7.0.59 - Error
> report
> HTTP Status 500 - Cannot create a session after the
> response has been committed noshade="noshade">type Exception reportmessage
> Cannot create a session after the response has been
> committeddescription The server encountered an
> internal error that prevented it from fulfilling this
> request.exception java.lang.IllegalStateException:
> Cannot create a session after the response has been committed
> >
>  org.apache.catalina.connector.Request.doGetSession(Request.java:3002)
> >   org.apache.catalina.connector.Request.getSession(Request.java:2378)
> >
>  
> org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:897)
> >
>  
> javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:229)
> >
>  
> org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:80)
> >
>  
> org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:69)
> >
>  
> org.

How to use Hybrid model in Kylin

2016-02-23 Thread yu feng
Hi All :
I find kylin-1.0 can generate Hybrid model which is composed by
multiple cubes, jira is here :
https://issues.apache.org/jira/browse/KYLIN-867

My question is how to use this feature in web UI.

BTW, if I release some segments, but the cuboid files in HDFS and htable in
Hbase of those segments can not be deleted, Is there any tool to release
those invalid datas.

Thanks a lot.


Re: How to use kylin with high cardinality dimensions.

2016-02-18 Thread yu feng
Thanks a lot, we find that kylin-1.x is different to support this demand,
currently, we can just support the sql that filter date equals to one day
and like can filter out most value, such as select url, count(1),
count(distinct col) from table where day = '2016-02-17' and url like '%xxx'
group by url;

I will try those ways :
1、decrease dictionary that loading to memory, I can cut full dictionary
into some pieces dictionary, and each pieces store some distinct
column(50W), and baseId is different, this can aviod OOM while generate and
load dictionary.
2、can we put more filter to coprocessor? such as do not translate like to
in, and do this filter in coprocossor.
3、do not execute aggregate in coprocessor if one query can exactly match
one cuboid, in that case, coprocessor need not store all tuples in memory .
4、take MEM_BUDGET_PER_QUERY as a config value, user can increase the value
while query count(distinct) value.

anything will be pleased If you have some suggestion.

2016-02-18 14:58 GMT+08:00 Li Yang :

> Better support of UHC (ultra high cardinality) columns is on dev plan. I'm
> thinking add custom encoding for dimension.
>
> However, even with those done, filtering URL using like will be still very
> slow because Kylin cannot pre-process and get prepared for such filtering.
>
> Alternatively, I'd suggest talk to the user to understand what they want by
> matching URL using like. Ideally you can extract "features" from URL during
> ETL process and store the features in cube instead of a long URL. E.g.
> maybe what user want is to know if the URL is from a search engine
> (contains google, baidu, yahoo...). Then a new column
> "IS_FROM_SEARCH_ENGINE" could be enriched during ETL and be stored in cube.
> Not only this is more practical, it is also more flexible and extensible.
> Sql like can only do substring matching, while your ETL process can handle
> very complex biz logic.
>
> On Thu, Feb 18, 2016 at 1:11 PM, hongbin ma  wrote:
>
> > First of all, using high card dimension(especially space consuming
> > dimension like URL) is not a really good idea. Even if the cube is built
> > successfully, the expansion ratio tends to be unacceptable. Besides, for
> > like functions, kylin basically treat it as another groupby dimension, so
> > the performance will be really bad.
> >
> > When high cardinality dimension has to be included, *we have limited
> > solutions now. *In 2.x-staging branch, which is not officially released
> > yet, we're trying to address the issue by 1. Use new aggregation group
> > techniques to reduce the number of cuboids to compute. (
> > http://kylin.apache.org/blog/2016/02/18/new-agg) 2. Use short fixed
> length
> > dimension (like url_id) to derive long length strings(like url). check
> > https://issues.apache.org/jira/browse/KYLIN-1313 for more details. 3.
> > Adopt
> > scalable dictionary solutions to replace current in-memory dictionary
> > building (TBD)
> >
> > I'm also answering your questions inline with this pen
> >
> > On Thu, Feb 18, 2016 at 11:03 AM, yu feng  wrote:
> >
> > > Hi All:
> > > We are encounting some problems while supporting a demand that a
> cube
> > > with some high cardinality dimensions, those dimensions are URLs and
> user
> > > want to use those dimensions in where clause and filter with like
> > function.
> > > besides, the cube has one distinct count measure.
> > >
> > > We has such problems :
> > > 1、for one URL dimension, cardinality is about 50W one day, and the size
> > > of fact_distinct_columns file is about 500M+, so when we build the cube
> > > with more day, the job will failed in 'Build Dimension Dictionary'
> > step(one
> > > dimension file is about 3GB)
> > >
> > ​
> > Currently ​Build Dimension Dictionary step will build a dictionary of
> > dimension in memory. There're too many URLs, and each URL is too long,
> > in-memory dictionary building will fail due to OOM.
> >
> > >
> > > 2、after building segment of one day, we find like filter is so slow
> > > to convert to in filter, and the filter is so big that buffer will out
> of
> > > bounds.
> > >
> >
> > for like functions, kylin basically treat it as another groupby
> dimension,
> > so the performance will be really bad.
> >
> > >
> > > 3、while executing sql with count(distinct col), the coprocossor will be
> > > disable(why ?), and scanner will return more tuple so that exceed the
> > > context threadhold and query will fail.
> > >
> > ​coprocessor is not enabled to protect region server from OOM.​
> >
> >
> > >
> > > Does anyone excounter such problem and how to solve such problems in
> the
> > > sence that creating a cube with high cardinality dimensions such as
> URLs.
> > >
> > > Any suggestions are welcome, Thanks a lot.
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>


How to use kylin with high cardinality dimensions.

2016-02-17 Thread yu feng
Hi All:
We are encounting some problems while supporting a demand that a cube
with some high cardinality dimensions, those dimensions are URLs and user
want to use those dimensions in where clause and filter with like function.
besides, the cube has one distinct count measure.

We has such problems :
1、for one URL dimension, cardinality is about 50W one day, and the size
of fact_distinct_columns file is about 500M+, so when we build the cube
with more day, the job will failed in 'Build Dimension Dictionary' step(one
dimension file is about 3GB)

2、after building segment of one day, we find like filter is so slow
to convert to in filter, and the filter is so big that buffer will out of
bounds.

3、while executing sql with count(distinct col), the coprocossor will be
disable(why ?), and scanner will return more tuple so that exceed the
context threadhold and query will fail.

Does anyone excounter such problem and how to solve such problems in the
sence that creating a cube with high cardinality dimensions such as URLs.

Any suggestions are welcome, Thanks a lot.


Re: only one reducer in job

2016-02-02 Thread yu feng
Hi,  as far as I know, KYLIN-1066
 is not about what you
are talking about, this job "Kylin Hive Column Cardinality Job" is
submitted after you loading table, it's function is calculate Cardinality
 of every column using hyperLogLog, so it have to use one reducer(something
like remove duplicate value), you can check other side of this job to
increase executing speed.

2016-02-03 8:11 GMT+08:00 greg gu :

> By the way, the job step that uses 1 reducer is "Kylin Hive Column
> Cardinality Job ", is this expected?
>
> > From: gug...@hotmail.com
> > To: dev@kylin.apache.org
> > Subject: only one reducer in job
> > Date: Tue, 2 Feb 2016 11:31:37 -0800
> >
> > When I process the cube, I found there on only one reducer, which cause
> the job to run very long time.
> > I found this https://issues.apache.org/jira/browse/KYLIN-1066, it
> mentioned the issue is fixed.
> >
> > If there a way to change the number of reducer?
> >
> > Thanks,
> >
> >
> >
>
>


Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

2016-02-02 Thread yu feng
We are looking forward the release of 2.0,  fast cubing algorithm, spark
supporting and streaming cube is very useful to us.
I have test 2.0-rc in our environment and it works fine, wish the release
comes soon.

2016-02-02 18:02 GMT+08:00 Yerui Sun :

> We’ve been looking forward the release of 2.0 for a long time.
> We also have tested the 2.0-rc internally for a quite while, and proved
> it’s stable.
>
> We’re confident the release for now.
>
> > 在 2016年2月2日,17:22,杨海乐  写道:
> >
> > hello all,
> >   As users of kylin.We all help Kylin released version 2.0 as soon as
> > possible in order to get better performance。As a member of the kylin
> > community , I sincerely hope Kylin will be more powerful。
> >
> > --
> > View this message in context:
> http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
>
>


Re: kylin concurrency test

2016-01-27 Thread yu feng
Yes, It is QPS, this result comes from page 36 in Apache
Kylin-Hadoop上的大规模联机分析平台
<http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Kylin%202014%20Dec.pdf>,
we do the same test in one and two kylin query node query result and
get similar result , so we use that picture for convenience, bottleneck of
kylin query throughput rely on hbase scan performance, which will related
to regionserver number and machine configuration, network etc.


2016-01-28 10:08 GMT+08:00 Luke Han :

> It's QPS, please contact Yu Feng (kylin committer) from NetEase for more
> detail.
>
> Thanks.
> Luke
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Thu, Jan 28, 2016 at 9:43 AM, hongbin ma  wrote:
>
> > i think by default it is QPS (queries per second)
> >
> > On Thu, Jan 28, 2016 at 7:34 AM, zhong zhang  wrote:
> >
> > > Hi All,
> > >
> > > There is an article <http://www.bitstech.net/2016/01/04/kylin-olap/
> > >posted
> > > by @Hu Wei at Neteast which introduces the concurrency test results. In
> > the
> > > article, there is a throughput result graph. Please see the attached.
> > > Based on my understanding, the x-axis is the number of Kylin server.
> > > What's the y-axis? Is it the requests at the same time?
> > >
> > > Best regards,
> > > Zhong
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>


Re: Convert Cuboid Data to HFile failed when hbase in different HDFS

2016-01-22 Thread yu feng
this problem caused by deploy hbase apart from hadoop, However we hadoop
cluster can not recognize name service of HDFS which hbase depend on.

2016-01-22 15:55 GMT+08:00 Li Yang :

> Sounds the same as https://issues.apache.org/jira/browse/KYLIN-957
>
>
> On Tue, Jan 19, 2016 at 10:23 PM, yu feng  wrote:
>
> > In the step 'Convert Cuboid Data to HFile' execute failed, error log is :
> >
> >
> > java.io.IOException: Failed to run job : Unable to map logical
> nameservice
> > URI 'hdfs://A' to a NameNode. Local configuration does not have a
> failover
> > proxy provide
> > r configured.
> > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> > at
> >
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> > at
> >
> >
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129)
> > at
> >
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at
> >
> >
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119)
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > I think it is because node manager in hadoop cluster can not recognition
> > hdfs://A in they config. So, I have to tranform the path
> > hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before
> > execute this step. and it works for me.
> >
> > I have create a jira here :
> > https://issues.apache.org/jira/browse/KYLIN-1280
> >
> > If you have a better solution, reply me please.
> >
>


Re: internal hive table and build the cube backward

2016-01-19 Thread yu feng
ShaoFeng Shi is right, kylin use hive command to generate  intermediate
table(take it as source data), and use hcatlog get data from hive in step2,
hive performance does have an impact on Kylin's performance, so a newer
version is recommended。

2016-01-20 8:05 GMT+08:00 ShaoFeng Shi :

> Only the first step actually, Kylin runs "hive -e" command to create an
> intermediate table; The following steps are running MR over the files under
> that table.
>
> 2016-01-20 4:18 GMT+08:00 zhong zhang :
>
> > Hi Yu and Everyone,
> >
> > Just a little bit supplement, Hive definitely involves in the step of
> > Create
> > Intermediate Flat Hive Table and Build Dimension Dictionary. The question
> > is that does Hive involve in the following steps of building cuboids?
> >
> > Best regards,
> > Zhong
> >
> > On Sun, Jan 17, 2016 at 10:35 PM, yu feng  wrote:
> >
> > > Firstly, kylin do not distinguish which kind table in hive,  if only
> you
> > > can query it in hive, so the table can be normal table, external table,
> > > view or table with some serdes.
> > > then I think it is hard to build cube backward along the time in kylin.
> > > maybe someone has some good ideas at this point.
> > >
> > > 2016-01-18 11:04 GMT+08:00 zhong zhang :
> > >
> > > > Hi All,
> > > >
> > > > I'm wondering can I build the Kylin cube backward along the time.
> More
> > > > specifically, can I build the cube from the current time to six
> months
> > > ago
> > > > and then from six months ago to 12 months ago and go on? In this
> way, I
> > > can
> > > > have the latest six months' cube result first.
> > > >
> > > > It's well known that the input of Kylin cube is hive table. Does it
> > make
> > > > any difference
> > > > between using internal hive table and external hive table when
> building
> > > the
> > > > cube?
> > > >
> > > > Best regards,
> > > > Zhong
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>


Convert Cuboid Data to HFile failed when hbase in different HDFS

2016-01-19 Thread yu feng
In the step 'Convert Cuboid Data to HFile' execute failed, error log is :


java.io.IOException: Failed to run job : Unable to map logical nameservice
URI 'hdfs://A' to a NameNode. Local configuration does not have a failover
proxy provide
r configured.
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at
org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:129)
at
org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:93)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:119)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I think it is because node manager in hadoop cluster can not recognition
hdfs://A in they config. So, I have to tranform the path
hdfs://A/path/to/hfile to hdfs://namenode_ip:port/path/to/hfile before
execute this step. and it works for me.

I have create a jira here : https://issues.apache.org/jira/browse/KYLIN-1280

If you have a better solution, reply me please.


Re: internal hive table and build the cube backward

2016-01-17 Thread yu feng
Firstly, kylin do not distinguish which kind table in hive,  if only you
can query it in hive, so the table can be normal table, external table,
view or table with some serdes.
then I think it is hard to build cube backward along the time in kylin.
maybe someone has some good ideas at this point.

2016-01-18 11:04 GMT+08:00 zhong zhang :

> Hi All,
>
> I'm wondering can I build the Kylin cube backward along the time. More
> specifically, can I build the cube from the current time to six months ago
> and then from six months ago to 12 months ago and go on? In this way, I can
> have the latest six months' cube result first.
>
> It's well known that the input of Kylin cube is hive table. Does it make
> any difference
> between using internal hive table and external hive table when building the
> cube?
>
> Best regards,
> Zhong
>


Re: Kylin capabilities

2016-01-15 Thread yu feng
OK,I will try to do it.

2016-01-16 10:22 GMT+08:00 Luke Han :

> Hive view is good idea for this, even could handle for UDF.
>
> @Yu, maybe you could help to draft some tutorial and commit to website
> about this.
>
>
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sat, Jan 16, 2016 at 8:58 AM, yu feng  wrote:
>
> > Does Kylin offer nesting dimensions or hierarchies?
> >
> > > Kylin does not support Snowflake schema, It is currently restricted to
> > >Star.
> >
> > Generally speaking, we always transform snowflake schema to star schema
> by
> > creating some view.
> >
> > 2016-01-16 3:31 GMT+08:00 Adunuthula, Seshu :
> >
> > > Does Kylin provide slicing and dicing capabilities on cubes?
> > >
> > > > Not sure what you mean, We can issue SQL on the cubes.
> > >
> > > Does Kylin offer nesting dimensions or hierarchies?
> > >
> > > > Kylin does not support Snowflake schema, It is currently restricted
> to
> > > >Star.
> > >
> > > Other than SQL, can users use Kylin cube with drag and drop
> functionality
> > > (something similar to the OLAP cubes in the traditional world)?
> > > > No it is not possible. It has to be a tool built on Kylin.
> > >
> > >
> > >
> > > Is MDX (Multi-dimensional functions) supported in Kylin? (This is a
> > > standard
> > > functionality of exiting OLAP tools).
> > > > MDX was wrapper was built but was not built by the Kylin team. Not
> sure
> > > >of how mature the capability is or if it works out of the box.
> > >
> > >
> > > Does Kylin provide role based security (who sees what data) and object
> > > level
> > > security (who sees which dimensions)
> > > > It provides basic security, you can grant permissions at the cube
> > level,
> > > >but cannot do row level or column level security.
> > >
> > > Can existing HBase table be directly leveraged by Kylin for cubes and
> > > suppose the HBase table was not created from any Hive Star schema.
> > > > No, We define the table row format, so you cannot use existing
> tables.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 1/15/16, 7:56 AM, "KylinPOC"  wrote:
> > >
> > > >Hi Kylin Team,
> > > >
> > > >I have a few questions around capabilities of Kylin.
> > > >
> > > >Does Kylin provide slicing and dicing capabilities on cubes?
> > > >
> > > >Does Kylin offer nesting dimensions or hierarchies?
> > > >
> > > >Other than SQL, can users use Kylin cube with drag and drop
> > functionality
> > > >(something similar to the OLAP cubes in the traditional world)?
> > > >
> > > >Is MDX (Multi-dimensional functions) supported in Kylin? (This is a
> > > >standard
> > > >functionality of exiting OLAP tools).
> > > >
> > > >Does Kylin provide role based security (who sees what data) and object
> > > >level
> > > >security (who sees which dimensions)
> > > >
> > > >Can existing HBase table be directly leveraged by Kylin for cubes and
> > > >suppose the HBase table was not created from any Hive Star schema.
> > > >
> > > >Thanks.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >--
> > > >View this message in context:
> > > >
> http://apache-kylin.74782.x6.nabble.com/Kylin-capabilities-tp3270.html
> > > >Sent from the Apache Kylin mailing list archive at Nabble.com.
> > >
> > >
> >
>


Re: Kylin capabilities

2016-01-15 Thread yu feng
Does Kylin offer nesting dimensions or hierarchies?

> Kylin does not support Snowflake schema, It is currently restricted to
>Star.

Generally speaking, we always transform snowflake schema to star schema by
creating some view.

2016-01-16 3:31 GMT+08:00 Adunuthula, Seshu :

> Does Kylin provide slicing and dicing capabilities on cubes?
>
> > Not sure what you mean, We can issue SQL on the cubes.
>
> Does Kylin offer nesting dimensions or hierarchies?
>
> > Kylin does not support Snowflake schema, It is currently restricted to
> >Star.
>
> Other than SQL, can users use Kylin cube with drag and drop functionality
> (something similar to the OLAP cubes in the traditional world)?
> > No it is not possible. It has to be a tool built on Kylin.
>
>
>
> Is MDX (Multi-dimensional functions) supported in Kylin? (This is a
> standard
> functionality of exiting OLAP tools).
> > MDX was wrapper was built but was not built by the Kylin team. Not sure
> >of how mature the capability is or if it works out of the box.
>
>
> Does Kylin provide role based security (who sees what data) and object
> level
> security (who sees which dimensions)
> > It provides basic security, you can grant permissions at the cube level,
> >but cannot do row level or column level security.
>
> Can existing HBase table be directly leveraged by Kylin for cubes and
> suppose the HBase table was not created from any Hive Star schema.
> > No, We define the table row format, so you cannot use existing tables.
>
>
>
>
>
>
> On 1/15/16, 7:56 AM, "KylinPOC"  wrote:
>
> >Hi Kylin Team,
> >
> >I have a few questions around capabilities of Kylin.
> >
> >Does Kylin provide slicing and dicing capabilities on cubes?
> >
> >Does Kylin offer nesting dimensions or hierarchies?
> >
> >Other than SQL, can users use Kylin cube with drag and drop functionality
> >(something similar to the OLAP cubes in the traditional world)?
> >
> >Is MDX (Multi-dimensional functions) supported in Kylin? (This is a
> >standard
> >functionality of exiting OLAP tools).
> >
> >Does Kylin provide role based security (who sees what data) and object
> >level
> >security (who sees which dimensions)
> >
> >Can existing HBase table be directly leveraged by Kylin for cubes and
> >suppose the HBase table was not created from any Hive Star schema.
> >
> >Thanks.
> >
> >
> >
> >
> >
> >--
> >View this message in context:
> >http://apache-kylin.74782.x6.nabble.com/Kylin-capabilities-tp3270.html
> >Sent from the Apache Kylin mailing list archive at Nabble.com.
>
>


Re: Exception on building cube at 2nd step (IncompatibleClassChangeError)

2016-01-15 Thread yu feng
I find there are some hadoop jar of 2.2.0 in
/home/hadoop/hbase/hbase-0.98.15-hadoop2/lib,
and your hadoop is 2.5.2, this will bring collision, please check your env
again.

2016-01-16 4:26 GMT+08:00 Eric Engstfeld :

> Hi,
>
> I have this exception on the 2nd step of cube build. I think its a
> compatibility issue but i couldn’t find the way to solve it. Everything you
> can tell me would be really helpful.
>
> ——
>
> This is the stack trace:
>
> [pool-5-thread-2]:[2016-01-15
> 22:49:30,032][ERROR][org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:134)]
> - ExecuteException job:e2f2f094-01df-4d9a-9d53-e873d0891094
> org.apache.kylin.job.exception.ExecuteException:
> org.apache.kylin.job.exception.ExecuteException:
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:111)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kylin.job.exception.ExecuteException:
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:111)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> ... 4 more
> Caused by: java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at
> org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:102)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at
> org.apache.kylin.job.hadoop.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:121)
> at
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:83)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.kylin.job.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> ... 6 more
>
> -
>
> Environment:
>
> Ubuntu 14.04 LTE
> Hadoop 2.5.2
> Hive 1.0.1
> hbase-0.98.15-hadoop2
> Kylin 1.2
>
> -
>
> I share with you the the kylin logging on startup:
>
> KYLIN_HOME is set to /home/hadoop/kylin/apache-kylin-1.2-bin
>
> Logging initialized using configuration in
> jar:file:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/hive-common-1.0.1.jar!/hive-log4j.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/hadoop/hadoop/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> HCAT_HOME is set to: /home/hadoop/hive/apache-hive-1.0.1-bin/hcatalog, use
> it to find hcatalog path:
> hive dependency:
> /home/hadoop/hive/apache-hive-1.0.1-bin/conf:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/velocity-1.5.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/commons-compress-1.4.1.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/curator-client-2.6.0.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/groovy-all-2.1.6.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/hive-exec-1.0.1.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hive/apache-hive-1.0.1-bin/lib/hi

Re: can we support adding mapping cube columns to hive table columns

2016-01-12 Thread yu feng
My suggestion is adding this mapping by creating view, So, you can change
the column name in hive table, and recreate the view, that will do not have
any effect to cube. and need not sync hive table. this view only used at
the first step, so can recreate it in any time except doing this step.

2016-01-12 16:13 GMT+08:00 dong wang :

> sometimes, we have to change column names of the source hive table AFTER
> the cube is built successfully, actually, in most cases, we may just change
> the column name without changing the column type, but now, we have to
> refresh the cube again when we change the column name of the source hive
> table and sync the hive table to kylin meta. in common sense, we want that
> the cube data may be avoid being calculated again since just changing the
> column name.
>


Re: Relationship between rowkey column length and cube size

2016-01-09 Thread yu feng
Let me have a try to explain it.

Cube size determines how to split region for table in hbase after generate
all cuboid files, for example, If all of your cuboid file size is 100GB,
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin
will create hbase table with 10 regions. it will calculate every start
rowkey and end rowkey of every region before create htable. then create
table with those split infomations.

Rowkey column length is another thing, you can choose either use dictionary
or set rowkey column length for every dimension , If you use dictionary,
kylin will build dictionary for this column(Trie tree), it means every
value of the dimension will be encoded as a unique number value, because
dimension value is a part of hbase rowkey, so it will reduce hbase table
size with dictionary. However, kylin store the dictionary in memory, if
dimension cardinality is large, It will become something bad. If you set rowkey
column length to N for one dimension, kylin will not build dictionary for
it, and every value will be cutted to a N-length string, so, no dictionary
in memory, rowkey in hbase table will be longer.

Hope to be helpful to you.

2016-01-09 13:00 GMT+08:00 Kiriti Sai :

> Hi,
> When using an UHC dimension, I've disabled the dictionary for that
> dimension in the advanced settings and set the rowkey column length as 100
> since it's something like a text description. The data has around 6.6
> billion rows and I guess the cardinality is nearly 1 billion for this row.
> I know Kylin is not suitable to be used in such scenario, but can someone
> please explain me the relationship between the cube size and the rowkey
> column length. I'm asking this question just out of curiosity, since I
> haven't found any explanation relating these two.
>
> Thank You.
>


Re: build a cube with two ultra high cardinality columns

2016-01-08 Thread yu feng
assume average size of this column is 32 bytes, 50 millions cardinality
means 1.5GB, in the step of 'Fact Table Distinct Columns.' mapper need read
from intermediate table and remove duplicate values(do it in Combiner),
however, this job will startup more than one mapper and just one reducer,
therefore, input for reducer is more than 1.5GB and in reduce function
kylin will create a new Set to contain all unique values, so , this is a
another 1.5GB.

I have encounter this probelm and I have to change MR config preperty for
every job, I modify those properties :

mapreduce.reduce.java.opts
-Xmx6000M
Larger heap-size for child jvms of
reduces.



mapreduce.reduce.memory.mb
8000
Larger resource limit for reduces.

you can check the value of those properties currently used and increase
them.

At Last, ask yourself Do you really need all detail values of those two
column, if not , you can create create view to change the source data or
just do not use dictionary while creating cube, set the length value for
them in 'Advanced Setting' step..

Hope to be helpful to you.

2016-01-09 6:17 GMT+08:00 zhong zhang :

> Hi All,
>
> There are two ultra high carnality columns in our cube. Both of them are
> over 50 million cardinality. When building the cube, it keeps giving us the
> error: Error: GC overhead limit exceeded for the reduce jobs at the
> step Extract
> Fact Table Distinct Columns.
>
> We've just updated to version1.2.
>
> Can anyone give some ideas to solve this issue?
>
> Best regards,
> Zhong
>


Re: Welcome new Apache Kylin committer: Yu Feng

2016-01-08 Thread yu feng
Thanks to all of you, I am very glad to became a committer of Apache Kylin,
and thank the community for recognition.

I am working in Hangzhou Research Institute of Netease(www.163.com),one of
the earliest Internet companies in China. I am  focus on Kylin in the past
six months and do some changes to adapt to our environment, I have done
some patchs and a new feature and contributed them to kylin community. In
my spare time, I like to communicate with others about kylin in mail list.
Besides OLAP and Big Data, I'm interested in distributed storage and NoSQL
system.

Currently we are using kylin provides fast and stable OLAP analysis
services to multiple products in our company, we choose kylin because it
has a lot of advantages, including ease to use, low latency of query
,supports standard SQL etc. At present, our users are very satisfied with
the performance of kylin too.

I am very proud to be a committer of Apache Kylin, and I will always do my
best to make contributions for Kylin community!


2016-01-08 23:46 GMT+08:00 Dong Li :

> Welcome!
>
> Thanks,
> Dong Li
>
> 2016-01-08 23:28 GMT+08:00 Luke Han :
>
> > I am very pleased to announce that the Project Management Committee
> > (PMC) of Apache Kylin has asked Yu Feng to becomeApache Kylin committer,
> > and she has already accepted.
> >
> > Yu has already made many contributions to Kylin community, to answer
> > others questions activity, submit patches for bug fixes and contributing
> a
> > great
> > feature about multi-hive source from different cluster.
> >
> > Please join me to welcome Yu.
> >
> > @Yu, please share with us a little about yourself.
> >
> > Luke Han
> >
> > On behalf of the Apache Kylin PPMC
> >
>
>
>
> --
> Thanks,
> Dong
>


Re: restful interface

2016-01-08 Thread yu feng
ot;
>  18:43:03 Wire - http-outgoing-0 >> "User-Agent: Apache-HttpClient/4.3.3
> (java 1.5)[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 >> "[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "HTTP/1.1 405 Method Not
> Allowed[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Server: Apache-Coyote/1.1[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Set-Cookie:
> JSESSIONID=1B48E4EE1F327464EA6B32449346BBB7; Path=/kylin/; HttpOnly[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Allow: GET[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Content-Type:
> text/html;charset=utf-8[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Content-Language: en[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Content-Length: 1047[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Date: Fri, 08 Jan 2016 10:41:50
> GMT[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "[\r][\n]"
>  18:43:03 Wire - http-outgoing-0 << "Apache
> Tomcat/7.0.59 - Error report<!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> H3
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> HTTP Status 405 - Request method 'POST' not
> supportedtype Status
> reportmessage Request method 'POST' not
> supporteddescription The specified HTTP method is not
> allowed for the requested resource. noshade="noshade">Apache Tomcat/7.0.59"
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 << HTTP/1.1
> 405 Method Not Allowed
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 << Server:
> Apache-Coyote/1.1
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 <<
> Set-Cookie: JSESSIONID=1B48E4EE1F327464EA6B32449346BBB7; Path=/kylin/;
> HttpOnly
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 << Allow:
> GET
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 <<
> Content-Type: text/html;charset=utf-8
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 <<
> Content-Language: en
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 <<
> Content-Length: 1047
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0 << Date:
> Fri, 08 Jan 2016 10:41:50 GMT
>  18:43:03 MainClientExec - Connection can be kept alive indefinitely
>  18:43:03 ResponseProcessCookies - Cookie accepted
> [JSESSIONID="1B48E4EE1F327464EA6B32449346BBB7", version:0,
> domain:192.168.1.12, path:/kylin/, expiry:null]
>  response.getStatusLine().getStatusCode():405
> java.lang.RuntimeException: Failed : HTTP error code : 40518:43:03
> ConnectionHolder - Cancelling request execution
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0: Shutdown
> connection
>
> at
> com.britecloud.spark.jobs.KylinJobUtils.listAllJobs(KylinJobUtils.java:63)
> at
> com.britecloud.spark.jobs.KylinJobUtils.main(KylinJobUtils.java:89)
> 18:43:03 ConnectionHolder - Connection discarded
>  18:43:03 LoggingManagedHttpClientConnection - http-outgoing-0: Close
> connection
>  18:43:03 PoolingHttpClientConnectionManager - Connection released: [id:
> 0][route: {}->http://192.168.1.12:7070][total kept alive: 0; route
> allocated: 0 of 2; total allocated: 0 of 20]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -- Original --
> From:  "yu feng";
> Date:  Fri, Jan 8, 2016 06:12 PM
> To:  "dev";
>
> Subject:  Re: restful interface
>
>
> You can add authentication infomation like this :
>
> protected void addHttpHeaders(HttpMethodBase method) {
> method.addRequestHeader("Accept", "application/json, text/plain,
> */*");
> method.addRequestHeader("Content-Type", "application/json");
>
> String basicAuth =
> DatatypeConverter.printBase64Binary((this.username + ":" +
> this.password).getBytes());
> method.addRequestHeader("Authorization", "Basic " + basicAuth);
> }
>
> In curl command, you can execute command like this : curl -H
> "Authorization:Basic auth-information-content"  -H "Content-Type:
> application/json" "http://localhost:7070/kylin/api/xxx";, and
> auth-information-content
> is value of username + ":" + password encoding with base64. for example
> "ADMIN:KYLIN" will encode to "QURNSU46S1lMSU4="
>
> 2016-01-08 17:37 GMT+08:00 王琳 :
>
> > hi
> >  I have a restful interface problems need to consult about:
> >
> >  Java call restful interface for user authentication how to control
> > this piece?
> >
> >
> > Thanks
>


Re: restful interface

2016-01-08 Thread yu feng
You can add authentication infomation like this :

protected void addHttpHeaders(HttpMethodBase method) {
method.addRequestHeader("Accept", "application/json, text/plain,
*/*");
method.addRequestHeader("Content-Type", "application/json");

String basicAuth =
DatatypeConverter.printBase64Binary((this.username + ":" +
this.password).getBytes());
method.addRequestHeader("Authorization", "Basic " + basicAuth);
}

In curl command, you can execute command like this : curl -H
"Authorization:Basic auth-information-content"  -H "Content-Type:
application/json" "http://localhost:7070/kylin/api/xxx";, and
auth-information-content
is value of username + ":" + password encoding with base64. for example
"ADMIN:KYLIN" will encode to "QURNSU46S1lMSU4="

2016-01-08 17:37 GMT+08:00 王琳 :

> hi
>  I have a restful interface problems need to consult about:
>
>  Java call restful interface for user authentication how to control
> this piece?
>
>
> Thanks


Re: Re: How to improve the performance of job!

2016-01-07 Thread yu feng
It is a long value, you should modify value to 67108864

2016-01-08 14:17 GMT+08:00 wenye...@163.com :

> I modified the mapreduce.input.fileinputformat.split.maxsize parameter
> according to your proposal, and now it's wrong:
>
> Query returned non-zero code: 1, cause: 'SET
> mapreduce.input.fileinputformat.split.maxsize=64MB' FAILED because
> mapreduce.input.fileinputformat.split.maxsize expects LONG type value.
>
> at
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:90)
> at
> org.apache.kylin.job.common.ShellExecutable.doWork(ShellExecutable.java:52)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> My profile kylin_job_conf.xml:
> 
> mapreduce.input.fileinputformat.split.maxsize
>   64MB
> Hive concurrency lock
> 
>
>
>
> wenye...@163.com
>
> 发件人: yu feng
> 发送时间: 2016-01-08 13:21
> 收件人: dev
> 主题: Re: How to improve the performance of job!
> According to our experience: you can try those :
> 1、use newer hive to promote the first step.
> 2、startup more mapper and reducer for every MR job, you can reduce the
> value of 'kylin.job.mapreduce.default.reduce.input.mb' in kylin.properties
> which means input size for every reducer in NDCuboid calculation steps.
> smaller value means more reducer.
> 3、 you can set the property
> 'mapreduce.input.fileinputformat.split.maxsize'('mapred.max.split.size' in
> prior hadoop version) in kylin_job_conf.xml, which means the max split size
> of a mapper, we set the value less than block size of hadoop cluster, such
> as 64MB
> 4、 try to set cube size as SMALL while creating cube, which can increase
> reducer number while generate Hfile.
>
> Hope it is helpful to you .
>
> 2016-01-08 13:00 GMT+08:00 wenye...@163.com :
>
> > I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster
> > environment, version of the kyling Kyline
> > apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive
> > table data is now 3000 ,but now job running the one hour, job
> schedule
> > is about 10%, view the task of MR found that job is not running to MR Do
> > you have any way to improve the performance of the job:
> > this is my configure:
> > 1.kylin.properties
> > #
> > # Licensed to the Apache Software Foundation (ASF) under one or more
> > # contributor license agreements.  See the NOTICE file distributed with
> > # this work for additional information regarding copyright ownership.
> > # The ASF licenses this file to You under the Apache License, Version 2.0
> > # (the "License"); you may not use this file except in compliance with
> > # the License.  You may obtain a copy of the License at
> > #
> > #http://www.apache.org/licenses/LICENSE-2.0
> > #
> > # Unless required by applicable law or agreed to in writing, software
> > # distributed under the License is distributed on an "AS IS" BASIS,
> > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > # See the License for the specific language governing permissions and
> > # limitations under the License.
> > #
> >
> > ## Config for Kylin Engine ##
> >
> > # List of web servers in use, this enables one web server instance to
> sync
> > up with other servers.
> > kylin.rest.servers=192.168.1.40:7070
> >
> > #set display timezone on UI,format like[GMT+N or GMT-N]
> > kylin.rest.timezone=GMT-8
> > kylin.query.cache.enabled=true
> > # The metadata store in hbase
> > kylin.metadata.url=kylin_metadata@hbase
> >
> > # The storage for final cube file in hbase
> > kylin.storage.url=hbase
> > kylin.job.yarn.app.rest.check.status.url=
> > http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
> > kylin.job.yarn.app.rest.check.interval.seconds=20
> > kylin.query.security.enabled=false
> > # Temp folder in hdfs, make sure user has the right access to the hdfs
> > directory
> > kylin.

Re: How to improve the performance of job!

2016-01-07 Thread yu feng
According to our experience: you can try those :
1、use newer hive to promote the first step.
2、startup more mapper and reducer for every MR job, you can reduce the
value of 'kylin.job.mapreduce.default.reduce.input.mb' in kylin.properties
which means input size for every reducer in NDCuboid calculation steps.
smaller value means more reducer.
3、 you can set the property
'mapreduce.input.fileinputformat.split.maxsize'('mapred.max.split.size' in
prior hadoop version) in kylin_job_conf.xml, which means the max split size
of a mapper, we set the value less than block size of hadoop cluster, such
as 64MB
4、 try to set cube size as SMALL while creating cube, which can increase
reducer number while generate Hfile.

Hope it is helpful to you .

2016-01-08 13:00 GMT+08:00 wenye...@163.com :

> I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster
> environment, version of the kyling Kyline
> apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive
> table data is now 3000 ,but now job running the one hour, job schedule
> is about 10%, view the task of MR found that job is not running to MR Do
> you have any way to improve the performance of the job:
> this is my configure:
> 1.kylin.properties
> #
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
> ## Config for Kylin Engine ##
>
> # List of web servers in use, this enables one web server instance to sync
> up with other servers.
> kylin.rest.servers=192.168.1.40:7070
>
> #set display timezone on UI,format like[GMT+N or GMT-N]
> kylin.rest.timezone=GMT-8
> kylin.query.cache.enabled=true
> # The metadata store in hbase
> kylin.metadata.url=kylin_metadata@hbase
>
> # The storage for final cube file in hbase
> kylin.storage.url=hbase
> kylin.job.yarn.app.rest.check.status.url=
> http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
> kylin.job.yarn.app.rest.check.interval.seconds=20
> kylin.query.security.enabled=false
> # Temp folder in hdfs, make sure user has the right access to the hdfs
> directory
> kylin.hdfs.working.dir=/kylin
>
> # HBase Cluster FileSystem, which serving hbase, format as
> hdfs://hbase-cluster:8020
> # leave empty if hbase running on same cluster with hive and mapreduce
> kylin.hbase.cluster.fs=hdfs://mycluster/apps/hbase/data
> kylin.route.hive.enabled=true
> kylin.route.hive.url=jdbc:hive2://192.168.1.50:1
>
> kylin.job.mapreduce.default.reduce.input.mb=500
>
> kylin.server.mode=all
>
> # If true, job engine will not assume that hadoop CLI reside on the same
> server as it self
> # you will have to specify kylin.job.remote.cli.hostname,
> kylin.job.remote.cli.username and kylin.job.remote.cli.password
> # It should not be set to "true" unless you're NOT running Kylin.sh on a
> hadoop client machine
> # (Thus kylin instance has to ssh to another real hadoop client machine to
> execute hbase,hive,hadoop commands)
> kylin.job.run.as.remote.cmd=false
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.hostname=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.username=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.password=
>
> # Used by test cases to prepare synthetic data for sample cube
> kylin.job.remote.cli.working.dir=/tmp/kylin
>
> # Max count of concurrent jobs running
> kylin.job.concurrent.max.limit=10
>
> # Time interval to check hadoop job status
> kylin.job.yarn.app.rest.check.interval.seconds=10
>
> # Hive database name for putting the intermediate flat tables
> #kylin.job.hive.database.for.intermediatetable=kylin
>
> #default compression codec for htable,snappy,lzo,gzip,lz4
> kylin.hbase.default.compression.codec=snappy
>
> # The cut size for hbase region, in GB.
> # E.g, for cube whose capacity be marked as "SMALL", split region per 10GB
> by default
> kylin.hbase.region.cut.small=10
> kylin.hbase.region.cut.medium=20
> kylin.hbase.region.cut.large=100
>
> # HBase min and max region count
> kylin.hbase.region.count.min=1
> kylin.hbase.region.count.max=500
>
> ## Config for Restful APP ##
> # database connection settings:
> ldap.server=
> ldap.username=
> ldap.password=
> ldap.user.searchBase=
> ldap.user.searchPattern=
> ldap.user.groupSearchBa

Re: java.io.IOException: NoSuchObjectException(message:default.kylin_intermediate_learn_kylin

2016-01-06 Thread yu feng
I am using hive 0.14.0 and do not config "hive.metastore.uris", maybe you
need to start hive metastore in hive 1.2.1

2016-01-07 13:28 GMT+08:00 Xiaoyu Wang :

> Hi,
> I think this exception may cause by the "hive.metastore.uris" property is
> not set in hive-site.xml
> Kylin use HCatalog to read Hive table. HCatalog will use
> "hive.metastore.uris" property to create HiveMetaStoreClient and get table
> meta.
> if not set . HCatalog use local meta server , so it will throw
> NoSuchObjectException exception.
>
> You can config the "hive.metastore.uris" property in hive-site.xml and
> start hive metastore. so HCatalog can connect to it.
>
>
>
> 在 2016年01月07日 12:52, yu feng 写道:
>
>> you can check whether the table "default.kylin_intermediate_
>> learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_
>> c8d9323ae277" exist in your hive and do any other hive-site.xml exist in
>> your classpath. it is strange because you can load hive table(before
>> create
>> and build cube).
>>
>> 2016-01-07 11:47 GMT+08:00 和风 <363938...@qq.com>:
>>
>> my env hadoop 2.7.1 ,kylin1.2,hive 1.2.1,hbase 0.98.
>>>
>>>
>>> my hive config :
>>>
>>>
>>> 
>>>  javax.jdo.option.ConnectionURL
>>>  jdbc:mysql://10.24.248.196:3306/hive?characterEncoding=UTF-8
>>> 
>>>  JDBC connect string for a JDBC metastore
>>>
>>>
>>> 
>>>  javax.jdo.option.ConnectionDriverName
>>>  com.mysql.jdbc.Driver
>>>  Driver class name for a JDBC metastore
>>>
>>>
>>> 
>>>  javax.jdo.option.ConnectionUserName
>>>  root
>>>  Username to use against metastore
>>> database
>>>
>>>
>>>
>>>
>>>  javax.jdo.option.ConnectionPassword
>>>  root
>>>  password to use against metastore
>>> database
>>>
>>>
>>>  
>>>datanucleus.transactionIsolation
>>>repeatable-read
>>>  
>>>
>>>  
>>>    datanucleus.valuegeneration.transactionIsolation
>>>repeatable-read
>>>  
>>>
>>>
>>>  
>>>hive.aux.jars.path
>>>
>>>
>>> file:///usr/local/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar,file:///usr/local/hive/lib/gson-2.2.4.jar,file:///usr/local/hive/lib/data-hive-udf.jar
>>>The location of the plugin jars that contain
>>> implementations of user defined functions and serdes.
>>>  
>>>
>>>
>>>
>>>
>>>
>>> -- 原始邮件 --
>>> 发件人: "yu feng";;
>>> 发送时间: 2016年1月7日(星期四) 中午11:25
>>> 收件人: "dev";
>>>
>>> 主题: Re: java.io.IOException:
>>> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin
>>>
>>>
>>>
>>> I have encountered this problem, it is more likely because your current
>>> hive metastore config is error, could you tell some more detailed
>>> infomation about your env.
>>>
>>> 2016-01-07 10:23 GMT+08:00 和风 <363938...@qq.com>:
>>>
>>> hi:
>>>>i build cube have a error.
>>>> logs:
>>>> java.io.IOException:
>>>>
>>>>
>>> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_c8d9323ae277
>>>
>>>> table not found)
>>>>  at
>>>>
>>>>
>>> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
>>>
>>>>  at
>>>>
>>>>
>>> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
>>>
>>>>  at
>>>>
>>>>
>>> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:101)
>>>
>>>>  at
>>>>
>>>>
>>> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:77)
>>>
>>>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>>>  at
>>>>
>

Re: java.io.IOException: NoSuchObjectException(message:default.kylin_intermediate_learn_kylin

2016-01-06 Thread yu feng
you can check whether the table "default.kylin_intermediate_
learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_
c8d9323ae277" exist in your hive and do any other hive-site.xml exist in
your classpath. it is strange because you can load hive table(before create
and build cube).

2016-01-07 11:47 GMT+08:00 和风 <363938...@qq.com>:

> my env hadoop 2.7.1 ,kylin1.2,hive 1.2.1,hbase 0.98.
>
>
> my hive config :
>
>
>
> javax.jdo.option.ConnectionURL
> jdbc:mysql://10.24.248.196:3306/hive?characterEncoding=UTF-8
> 
> JDBC connect string for a JDBC metastore
>   
>
>
> javax.jdo.option.ConnectionDriverName
> com.mysql.jdbc.Driver
> Driver class name for a JDBC metastore
>   
>
>
> javax.jdo.option.ConnectionUserName
> root
> Username to use against metastore database
>   
>
>
>   
> javax.jdo.option.ConnectionPassword
> root
> password to use against metastore database
>   
>
> 
>   datanucleus.transactionIsolation
>   repeatable-read
> 
>
> 
>   datanucleus.valuegeneration.transactionIsolation
>   repeatable-read
> 
>
>
> 
>   hive.aux.jars.path
>
> file:///usr/local/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar,file:///usr/local/hive/lib/gson-2.2.4.jar,file:///usr/local/hive/lib/data-hive-udf.jar
>   The location of the plugin jars that contain
> implementations of user defined functions and serdes.
> 
>
>
>
>
>
> -- 原始邮件 --
> 发件人: "yu feng";;
> 发送时间: 2016年1月7日(星期四) 中午11:25
> 收件人: "dev";
>
> 主题: Re: java.io.IOException:
> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin
>
>
>
> I have encountered this problem, it is more likely because your current
> hive metastore config is error, could you tell some more detailed
> infomation about your env.
>
> 2016-01-07 10:23 GMT+08:00 和风 <363938...@qq.com>:
>
> > hi:
> >   i build cube have a error.
> > logs:
> > java.io.IOException:
> >
> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_c8d9323ae277
> > table not found)
> > at
> >
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
> > at
> >
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
> > at
> >
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:101)
> > at
> >
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:77)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at
> >
> org.apache.kylin.job.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
> > at
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
> > at
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by:
> >
> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_c8d9323ae277
> > table not found)
> > at
> >
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1808)
> > at
> >
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1778)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> org.apache.hadoop.hive.metastore.Retryin

Re: java.io.IOException: NoSuchObjectException(message:default.kylin_intermediate_learn_kylin

2016-01-06 Thread yu feng
I have encountered this problem, it is more likely because your current
hive metastore config is error, could you tell some more detailed
infomation about your env.

2016-01-07 10:23 GMT+08:00 和风 <363938...@qq.com>:

> hi:
>   i build cube have a error.
> logs:
> java.io.IOException:
> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_c8d9323ae277
> table not found)
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
> at
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:101)
> at
> org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:77)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.kylin.job.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by:
> NoSuchObjectException(message:default.kylin_intermediate_learn_kylin_four_2015020100_2015123000_8d26cc4b_e012_4414_a89b_c8d9323ae277
> table not found)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1808)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1778)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy47.get_table(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1208)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
> at com.sun.proxy.$Proxy48.getTable(Unknown Source)
> at
> org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
> at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
> at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
> ... 13 more


Re: Error when scan from lower key

2016-01-06 Thread yu feng
This means your one of hbase scanner execute failed, you can check logs in
regionserver for more detailed  infomation.

2016-01-07 3:46 GMT+08:00 sdangi :

> Hi Kylin Team --
>
>
>
> I'm hitting the below error when executing query with the time dimension in
> the where clause against  a 35GB cube (No issues running otherwise). Same
> exact query works fine against Hive with the WHERE clause.
>
> Any ideas?
>
> Thanks,
> Regards,
>
> Error when scan from lower key  � ;8 to upper key  � on
> table KYLIN_PWNXRLXEZ7. while executing SQL: "SELECT
> CUSTOMER_DIM_T.PRN_CST_NM ,CURRENCY_DIM_T.SHRT_NM ,BRANCH_DIM_T.BR_NM
> ,COUNTRY_DIM_T.CTY_NM ,SUM(AML_TXN_FCT_CUB_T.USD_TXN_AMT)
> ,SUM(AML_TXN_FCT_CUB_T.AC_CCY_TXN_AMT) FROM
> DL_FINSERV_DEMO.AML_TXN_FCT_CUB_T as AML_TXN_FCT_CUB_T INNER JOIN
> DL_FINSERV_DEMO.CUSTOMER_DIM_T as CUSTOMER_DIM_T ON
> AML_TXN_FCT_CUB_T.FIRM_CST_KEY = CUSTOMER_DIM_T.KEY INNER JOIN
> DL_FINSERV_DEMO.BRANCH_DIM_T as BRANCH_DIM_T ON
> AML_TXN_FCT_CUB_T.TXN_BR_KEY
> = BRANCH_DIM_T.BR_KEY INNER JOIN DL_FINSERV_DEMO.COUNTRY_DIM_T as
> COUNTRY_DIM_T ON AML_TXN_FCT_CUB_T.RMTR_CTY_KEY = COUNTRY_DIM_T.CTY_KEY
> INNER JOIN DL_FINSERV_DEMO.CURRENCY_DIM_T as CURRENCY_DIM_T ON
> AML_TXN_FCT_CUB_T.RMTR_CCY_KEY = CURRENCY_DIM_T.KEY INNER JOIN
> DL_FINSERV_DEMO.DATE_DIM_T as DATE_DIM_T ON
> AML_TXN_FCT_CUB_T.TXN_BOOK_DT_KEY = DATE_DIM_T.DT_KEY
>
> WHERE DATE_DIM_T.DT_KEY >= date'2015-04-01'
>
> GROUP BY CUSTOMER_DIM_T.PRN_CST_NM ,CURRENCY_DIM_T.SHRT_NM
> ,BRANCH_DIM_T.BR_NM ,COUNTRY_DIM_T.CTY_NM LIMIT 5000"
>
> --
> View this message in context:
> http://apache-kylin.74782.x6.nabble.com/Error-when-scan-from-lower-key-tp3083.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>


Re: aboutthe; parameter 'acceptPartial'

2016-01-04 Thread yu feng
In my opinion, acceptPartial means whether return partial result if your
query result has more rows than limit value , and it will always set to
true no matter query from web UI or jdbc.
However, if you query from web UI, you can set the limit (default
is 5), if you query from jdbc, the default limit will set to 100 if
your sql do not contain 'limit'.

2016-01-04 18:15 GMT+08:00 wangsh...@sinoaudit.cn :

> Hi all:
>  Can anybody tell me what the query parameter 'acceptPartial' means? and I
> wonder how I can setup this parameter in jdbc.
>
>
>
> wangsh...@sinoaudit.cn
>


Feature about taking more than one hive source as kylin input in kylin-1.x

2016-01-01 Thread yu feng
Hi all:
I have been submit a patch on kylin-1.0(KYLIN-1172), it can take more
hive as kylin input. even those hive based on different hadoop cluster.
However, After I consult with @Luke Han and @Shi, Shaofeng, they do not
plan to add new big fearures to kylin-1.x which is in a stable state.

I think maybe we can create a new branch for this feature, and offer an
option for those who really need it. Instead of deploying many kylin
environment for different hive source.

What is more, I am being familiar with kylin-2.x which is more
extensible in architecture, and I will add one input named "other hives"
which can realize the same feature. and I think it will easier to add this
feature to kylin-2.x.

I want to get some advice from kylin community and what is next for me
to create a new branch. Thanks for any suggestion.


Re: FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.ConditionalWork cannot be cast to org.apache.hadoop.hive.ql.plan.MapredWork

2015-12-28 Thread yu feng
execute this sql with hive -e 'sql' and check the result, I think this is
something error with you hive env, maybe hive version problem...

2015-12-28 18:51 GMT+08:00 taylor zhang :

> hi all
>
> I got below error when ran example cude 'kylin_sales_cube', it seems that
> hive cannot insert data into external table? i have no idea and cannot get
> help from internet can you please help?
>
>
> hive> INSERT OVERWRITE TABLE
>
> kylin_intermediate_kylin_sales_cube_desc_1970010100_2014010100_abd5658e_303f_4d5f_b2f6_61cac37fc782
> SELECT
> > KYLIN_SALES.PART_DT
> > ,KYLIN_SALES.LEAF_CATEG_ID
> > ,KYLIN_SALES.LSTG_SITE_ID
> > ,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME
> > ,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME
> > ,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME
> > ,KYLIN_SALES.LSTG_FORMAT_NAME
> > ,KYLIN_SALES.PRICE
> > ,KYLIN_SALES.SELLER_ID
> > FROM DEFAULT.KYLIN_SALES as KYLIN_SALES
> > INNER JOIN DEFAULT.KYLIN_CAL_DT as KYLIN_CAL_DT
> > ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT
> > INNER JOIN DEFAULT.KYLIN_CATEGORY_GROUPINGS as
> KYLIN_CATEGORY_GROUPINGS
> > ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID
> AND KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID;
>
> FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.ConditionalWork
> cannot be cast to org.apache.hadoop.hive.ql.plan.MapredWork
>
> --
> View this message in context:
> http://apache-kylin.74782.x6.nabble.com/FAILED-ClassCastException-org-apache-hadoop-hive-ql-plan-ConditionalWork-cannot-be-cast-to-org-apachk-tp2944.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>


Re: cube model will be overridden while creating a new cube with the same name

2015-12-25 Thread yu feng
Yes, It solve this problem in web UI, However If I do not create cube in
web UI(such as curl command), the problem will happen again, I think you
should check whether cube name exist in CubeController(server side), then
update cube model ...

2015-12-25 17:38 GMT+08:00 Jian Zhong :

> we add CUBE_NAME check when create cube on UI,released in V1.2
>
> https://issues.apache.org/jira/browse/KYLIN-966?filter=-1
>
>
> On Fri, Dec 25, 2015 at 3:49 PM, Jian Zhong 
> wrote:
>
> > Good catch,
> >
> > I'll fix this.
> >
> > https://issues.apache.org/jira/browse/KYLIN-1254
> >
> > On Fri, Dec 25, 2015 at 3:17 PM, yu feng  wrote:
> >
> >> Hi, I find a Bug like this in kylin-1.0:
> >> I build a cube named TEST successfully, and then build another cube
> named
> >> TEST too, those two cube has different fact table, the second cube will
> be
> >> error becacuse "The cube named TEST already exists", But the model of
> >> first
> >> cube will be overridden.
> >>
> >> this is because kylin will save or update cube model before save a cube.
> >>
> >
> >
>


cube model will be overridden while creating a new cube with the same name

2015-12-24 Thread yu feng
Hi, I find a Bug like this in kylin-1.0:
I build a cube named TEST successfully, and then build another cube named
TEST too, those two cube has different fact table, the second cube will be
error becacuse "The cube named TEST already exists", But the model of first
cube will be overridden.

this is because kylin will save or update cube model before save a cube.


Re: error in calculate cardinality for view.

2015-12-23 Thread yu feng
Do not plan to fix it ?(Resolution:Won't Fix)
If you  calculate cardinality for the underlying table, it can not reflect
the real cardinality in view, maybe we need some other solution.

2015-12-24 15:17 GMT+08:00 Shi, Shaofeng :

> This is an known issue of Hive, already recorded in
> https://issues.apache.org/jira/browse/KYLIN-916
> The workaround is to load and then calculate cardinality for the
> underlying table which is not a view;
>
> On 12/24/15, 3:11 PM, "yu feng"  wrote:
>
> >I load a view in onew project, However, calculat cardinality  always error
> >like this:
> >
> >java.io.IOException: java.lang.NullPointerException
> >at
> >org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputForma
> >t.java:97)
> >at
> >org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputForma
> >t.java:51)
> >at
> >org.apache.kylin.job.hadoop.cardinality.HiveColumnCardinalityJob.run(HiveC
> >olumnCardinalityJob.java:79)
> >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >at
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> >able.java:62)
> >at
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >at
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> >nedExecutable.java:51)
> >at
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> >le.java:107)
> >at
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> >tScheduler.java:130)
> >at
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> >1145)
> >at
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> >:615)
> >at java.lang.Thread.run(Thread.java:745)
> >Caused by: java.lang.NullPointerException
> >at java.lang.Class.forName0(Native Method)
> >at java.lang.Class.forName(Class.java:191)
> >at
> >org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.(FosterStora
> >geHandler.java:59)
> >at
> >org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:4
> >17)
> >at
> >org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:3
> >80)
> >at
> >org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(Initial
> >izeInput.java:158)
> >at
> >org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(Initial
> >izeInput.java:137)
> >at
> >org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInpu
> >t.java:86)
> >at
> >org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputForma
> >t.java:95)
> >... 12 more
> >
> >
> >This is because it can not find serde jar for this view(table), do someone
> >has idea about it,,,
>
>


error in calculate cardinality for view.

2015-12-23 Thread yu feng
I load a view in onew project, However, calculat cardinality  always error
like this:

java.io.IOException: java.lang.NullPointerException
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at
org.apache.kylin.job.hadoop.cardinality.HiveColumnCardinalityJob.run(HiveColumnCardinalityJob.java:79)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at
org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.(FosterStorageHandler.java:59)
at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:417)
at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:380)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:158)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
... 12 more


This is because it can not find serde jar for this view(table), do someone
has idea about it,,,


Re: ClassNotFoundException: org.apache.kylin.common.mr.KylinMapper

2015-12-23 Thread yu feng
In my current version(1.0), for every mr job, kylin will find jar based on
'kylin.job.jar' property in config, if it not exist, it check file located
at $kylin_home/lib/kylin-job-(.+)\\.jar, So, is your config right, or have
you renamed the kylin-job-xxx.jar ?

2015-12-24 10:03 GMT+08:00 杨海乐 :

> Hello all,
> In the second step ,I get the message ,then I download the job.jar
> and find the org.apache.kylin.common.mr.KylinMapper class. In
> $kylin_home/lib and $kylin_home/tomcat/webapp/kylin/WEB-INFO/lib,The class
> exists.
> 2015-12-24 09:39:38,977 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating
> schedule, headroom=
> 2015-12-24 09:39:38,977 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow
> start threshold not met. completedMapsForReduceSlowstart 13
> 2015-12-24 09:39:39,156 FATAL [IPC Server handler 11 on 24813]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1448955124847_0374_m_000100_0 - exited :
> java.lang.ClassNotFoundException: org.apache.kylin.common.mr.KylinMapper
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:274)
> at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
> at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>


Re: Questions about query: cache, offset and order by desc

2015-12-21 Thread yu feng
In my opinion, maybe you can increase your ehcache which set in ehcache.xml
or ehcache-test.xml based on you "profile" setting, but I find
the memoryStoreEvictionPolicy is "LRU", I am not clear about how it works,
hope someone can explain it...
you can disable the auto adding if you query it with jdbc instead of web UI
, and I do not find any cache in kylin storage engine, so it always fetch
result from hbase. maybe it is a optimization point.
I check my result with 'select xxx from xxx order by desc limit 10' and get
the right result, and I think kylin will take out all data from storage
engine in join or tableScan, So, it is strange that "order by desc limit"
get error result...

2015-12-21 19:44 GMT+08:00 Yerui Sun :

> Hi,all
>We met a scenario with strictly performance requirements. With some
> code reading of query service and engine, here’s some questions:
>
>1. The cache doesn’t work.
>Assuming a query with 11000 result rows, the first execution
> consumes 5+ seconds, and execute it immediately, still consumes 5 seconds.
> Looking into the query log, both the two queries has 'Hit Cache: false'.
> The duration threshold has been set to 3000, and the result count threshold
> is 1. With additional log, the two queries has same hash code, and both
> been put into cache. But there’s a warning log as followed, which not sure
> be the reason:
> [WARN][net.sf.ehcache.pool.sizeof.ObjectGraphWalker.checkMaxDepth(ObjectGraphWalker.java:209)]
> - The configured limit of 1,000 object references was reached while
> attempting to calculate the size of the object graph. Severe performance
> degradation could occur if the sizing operation continues. This can be
> avoided by setting the CacheManger or Cache  elements
> maxDepthExceededBehavior to "abort" or adding stop points with
> @IgnoreSizeOf annotations. If performance degradation is NOT an issue at
> the configured limit, raise the limit value using the CacheManager or Cache
>  elements maxDepth attribute. For more information, see the
> Ehcache configuration documentation.
>
> 2. How offset (query with page) worked?
> The user want to query kylin with page like mysql. I’ve found the
> writing like 'limit 10 offset 20 row fetch next 10 row only’ worked,
> meaning fetch next 10 rows start from the 21st row, and limit up to 10.
> In fact, the limit is not necessary for user, the reason to add this is to
> avoid auto adding ‘limit 5’, so is it possible to disable the auto
> adding?
> The second question is the processing of query with offset. One
> possible way is scan the first 30 (or 10) rows, and filter out only the
> 21~30 rows by Calcite engine. Another way is jump to the 21st row directly,
> and scan out only the next 10 rows. I guess we process it in the first way
> for now, because there’s no way to directly jump specific rows in HBase. If
> my guessing is correct, how about the next query with 'limit 10 offset
> 30 row fetch next 10 row only’? Do we need to scan out all results again?
> If yes, a cache in here should accelerate the next offset queries.
>
> 3. How 'order by desc' worked?
> The 'order by' also be used. But some result seems not right with
> ‘order by desc’. Considering a query with 1 result rows, adding ‘limit
> 5000 order by desc’, the result should be 1~5001 rows, but 5000~1 in
> fact. I guess the reason is, the order was processed in Calcite engine
> after scanning result, and the scan of HBase is always ascend, that’s why
> got the result with 1~5000. To resolve this, is it possible to push order
> by down into hbase scan, and cast into a reverse scan?
>
> I’m not very familiar with query service and engine for now, but feels
> that there’s still some room for performance improvements.
> Looking for any comments.
>
>
> Best Regards,
> Yerui Sun
> sunye...@gmail.com
>
>
>
>


Re: build dictionary for timestamp column with DateStrDictionary

2015-12-15 Thread yu feng
Yes, we always truncate timestamp to five minute and cardinality is acceptable,
maybe I just change DateStrDictionary to TrieDictionary when building
dictionary for timestamp column.

2015-12-15 16:09 GMT+08:00 hongbin ma :

> however it still requires caution if you're using a timestamp column.
> Timestamp column may has very high cardinality if you don't apply any
> normalization on it. Usually it's suggested to truncate the second or
> minute to reduce cardinality.
>
> On Tue, Dec 15, 2015 at 4:07 PM, hongbin ma  wrote:
>
> > in 2.x versions, timestamp is being supported
> >
> > On Tue, Dec 15, 2015 at 4:00 PM, yu feng  wrote:
> >
> >> Hi All :
> >> I build a cube, fact table like this :
> >> hive> describe testtimestamp;
> >> OK
> >> ts   timestamp
> >> fname   string
> >> lname   string
> >> type int
> >> cost int
> >>
> >> I build a cube with dimensions 'ts', 'fname', 'lname' and type, However,
> >> after build the cube , I run query like 'select dictinct ts from
> >> testtimestamp', and It return :
> >> +-+
> >> | TS  |
> >> +-+
> >> | 2015-12-14 16:00:00 |
> >> | 2015-12-12 16:00:00 |
> >> | 2015-12-11 16:00:00 |
> >> | 2015-12-09 16:00:00 |
> >> | 2015-12-10 16:00:00 |
> >> | 2015-12-15 16:00:00 |
> >> | 2015-12-13 16:00:00 |
> >> +-+
> >>
> >> then, this is a error result, I query it in hive , it return :
> >> 2015-12-10 00:00:00
> >> 2015-12-11 01:02:03
> >> 2015-12-12 05:02:10
> >> 2015-12-12 06:08:10
> >> 2015-12-12 16:02:18
> >> 2015-12-13 06:28:40
> >> 2015-12-14 03:20:15
> >> 2015-12-14 11:04:18
> >> 2015-12-15 10:13:21
> >> 2015-12-16 12:04:12
> >>
> >> I know the reason is kylin use DateStrDictionary to build dictionary for
> >> column type like ("date")、("time")、("datetime")、("timestamp"); then it
> >> will
> >> try to use SimpleDateFormat("-MM-dd")  parsing column values, so
> after
> >> build dictionary, timestamp value in same day transform to the same
> value
> >> of Date.
> >>
> >> Is it a bug or some other consideration ?
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>


build dictionary for timestamp column with DateStrDictionary

2015-12-15 Thread yu feng
Hi All :
I build a cube, fact table like this :
hive> describe testtimestamp;
OK
ts   timestamp
fname   string
lname   string
type int
cost int

I build a cube with dimensions 'ts', 'fname', 'lname' and type, However,
after build the cube , I run query like 'select dictinct ts from
testtimestamp', and It return :
+-+
| TS  |
+-+
| 2015-12-14 16:00:00 |
| 2015-12-12 16:00:00 |
| 2015-12-11 16:00:00 |
| 2015-12-09 16:00:00 |
| 2015-12-10 16:00:00 |
| 2015-12-15 16:00:00 |
| 2015-12-13 16:00:00 |
+-+

then, this is a error result, I query it in hive , it return :
2015-12-10 00:00:00
2015-12-11 01:02:03
2015-12-12 05:02:10
2015-12-12 06:08:10
2015-12-12 16:02:18
2015-12-13 06:28:40
2015-12-14 03:20:15
2015-12-14 11:04:18
2015-12-15 10:13:21
2015-12-16 12:04:12

I know the reason is kylin use DateStrDictionary to build dictionary for
column type like ("date")、("time")、("datetime")、("timestamp"); then it will
try to use SimpleDateFormat("-MM-dd")  parsing column values, so after
build dictionary, timestamp value in same day transform to the same value
of Date.

Is it a bug or some other consideration ?


Re: about assigning different mr job queue to different user groups inside one kylin instances

2015-12-14 Thread yu feng
Maybe a quick solution is just create a config file for every project named
''kylin_job_conf_${projectname}.xml", it will just modify code that
selecting config file for a MR job, and you can control every MR config
property in project level.

2015-12-11 22:00 GMT+08:00 Shi, Shaofeng :

> This is a valid scenario; So far Kylin doesn¹t have project or cube level
> job configurations, if it be implemented, you problem will be solved.
>
> On 12/11/15, 7:09 PM, "dong wang"  wrote:
>
> >Currently, we have different business groups, we want to assign different
> >mr job queue to different businiess user groups inside ONLY ONE kylin
> >instance, do we have this feature?  and as searched, I find the following
> >piece:
> >
> >public static final String KYLIN_MAP_JOB_QUEUE = "mapred.job.queue.name";
> >
> >
> >If we can pass a parameter to indicate different users to use different mr
> >job queue when building their own jobs with updating the codes, a question
> >is that is there any potential problems to conduct the action?
>
>


Re: The Apache Software Foundation Announces Apache™ Kylin™ as a Top-Level Project

2015-12-11 Thread yu feng
Congratulations!

2015-12-11 10:10 GMT+08:00 hongbin ma :

> ​cheers!​
>
> On Thu, Dec 10, 2015 at 5:21 PM, Hao Chen  wrote:
>
> > Congrats, Apache™ Kylin™
> >
> > --
> >
> > Hao
> >
> >
> > On Thu, Dec 10, 2015 at 4:09 PM, Li Yang  wrote:
> >
> > > Cheers!
> > >
> > > On Wed, Dec 9, 2015 at 12:13 PM, Xiaoyu Wang 
> wrote:
> > >
> > >> Congratulations!
> > >>
> > >>
> > >> 在 2015年12月09日 11:22, Abhilash L L 写道:
> > >>
> > >>> Congrats!
> > >>>
> > >>> Regards,
> > >>> Abhilash
> > >>>
> > >>> On Wed, Dec 9, 2015 at 6:51 AM, 王猛  wrote:
> > >>>
> > >>> Great !
> > 
> >  2015-12-08 22:25 GMT+08:00 Luke Han :
> > 
> >  Hi community,
> > >  I'm so exciting to let you know the official Kylin TLP
> > > announcement
> > > from ASF is live now,
> > >  Please  check blow links for more detail.
> > >
> > >  Great thanks to all PMCs, committers, contributors, users and
> > > everyone,
> > > especially thanks to our mentors and ASF.
> > >
> > >   Thanks.
> > > Luke
> > >
> > >
> > > - NASDAQ GlobeNewswire
> > >
> > >
> > 
> >
> http://globenewswire.com/news-release/2015/12/08/793713/0/en/The-Apache-Software-Foundation-Announces-Apache-Kylin-as-a-Top-Level-Project.html
> > 
> > > - ASF "Foundation" blog http://s.apache.org/GZx
> > >
> > > - @TheASF Twitter feed
> > > https://twitter.com/TheASF/status/674181852018683904
> > >
> > >
> > >
> > >
> > >>
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>


Re: How to choose appropriate Cube size

2015-12-10 Thread yu feng
 Cube size will affect how to split region when creating htable, smaller
cube size will cause more region with regard to the same source dara.

2015-12-10 16:51 GMT+08:00 Li Yang :

> Em.. do need more document.
>
> The "Cube Size" under the advanced setting affects the MR job
> configuration. Under KYLIN_HOME/conf, you can have optional job_conf xml
> for particular size of cube.
>
> -rw-r--r--  1 b_kylin hdmi-technology 2564 Dec  7 00:45 kylin_hive_conf.xml
> -rw-r--r--  1 b_kylin hdmi-technology 3660 Dec  7 00:45
> kylin_job_conf_large.xml   <---
> -rw-r--r--  1 b_kylin hdmi-technology 3661 Dec  7 00:45
> kylin_job_conf_medium.xml  <---
> -rw-r--r--  1 b_kylin hdmi-technology 3648 Dec  7 00:45
> kylin_job_conf_small.xml   <---
> -rw-r--r--  1 b_kylin hdmi-technology 2650 Dec  7 00:45 kylin_job_conf.xml
> -rw-r--r--  1 b_kylin hdmi-technology 4584 Dec  7 00:45 kylin.properties
>
> The "kylin_job_conf.xml" is always the default job conf if the size
> specific conf does not exist.
>
> On Tue, Dec 8, 2015 at 4:32 PM, 杨海乐  wrote:
>
> > Hello all,
> >
> > Can someone explain how Cube size affect the result?
> >
> >
> >
> >
>


Function of Kylin Hive Column Cardinality Job

2015-12-09 Thread yu feng
Hi all:
every time I load a hive table, kylin will do a job named "Kylin Hive
Column Cardinality Job" asynchronous, which will calculate the Cardinality
of every column with HyperLogLog, I am doubt the actual effect.

any way to close it ?


Re: Kylin ingnores startTime in cube build process?

2015-12-07 Thread yu feng
Kylin will ignore startTime while building a segment, It will take 1)
start time you set in building cube if you are build first segment, 2)
end time of last segment if you append new segment as start time of
new segment.

2015-12-07 20:22 GMT+08:00, Marek Wiewiorka :
> Hi All,
> I came across a weird situation with Kylin cube incremental refresh.
> I will try to describe it:
> I've a cube with data up to 2015.12.03.
> When tried to to build a cube segment for 2015.12.06 by specify a json like
> this:
> {
>  "startTime": "144936000",
>  "endTime": "144944640",
>  "buildType": "BUILD"
> }
> Kylin ignored startTime and started building a segment 2015-12-03 --
> 2015-12-07
> instead 2015-12-06 -- 2015-12-07 - so starting to build a segment starting
> from the most
> recent date in segment already available not the one I specified in a json
> file.
>
> So is it a bug or a feature that it's not possible to build a one-day
> segment in such a scenario - it seems like Kylin does not allow gaps
> between segments?
>
> Thanks in advance!
>


Re: Can not send email caused by Build Base Cuboid Data step failed

2015-12-01 Thread yu feng
Hi Li Yang, I want to know Have you ever met the error while replace error
log in mail content :
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:808)
at java.util.regex.Matcher.replaceAll(Matcher.java:906)
at java.lang.String.replaceAll(String.java:2162)
at
org.apache.kylin.job.cube.CubingJob.formatNotifications(CubingJob.java:104)
at
org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:211)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:101)
at
org.apache.kylin.job.cube.CubingJob.onExecuteFinished(CubingJob.java:130)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This is because logMsg contain character '$', which lead to error in
replaceAll function. Just remind you if you have not met this situation.

2015-10-29 18:33 GMT+08:00 yu feng :

> Good for you~
>
> 2015-10-29 18:15 GMT+08:00 Li Yang :
>
>> Met this one before. It has been fixed on latest code.
>>
>> On Tue, Oct 27, 2015 at 2:29 PM, yu feng  wrote:
>>
>> > I build a cube but it failed in the "Build Base Cuboid Data" step, I can
>> > not get job failed email and find the stacktrace in log :
>> > java.lang.NullPointerException
>> > at java.util.regex.Matcher.appendReplacement(Matcher.java:758)
>> > at java.util.regex.Matcher.replaceAll(Matcher.java:906)
>> > at java.lang.String.replaceAll(String.java:2162)
>> > at
>> >
>> org.apache.kylin.job.cube.CubingJob.formatNotifications(CubingJob.java:98)
>> > at
>> >
>> >
>> org.apache.kylin.job.execution.AbstractExecutable.notifyUserStatusChange(AbstractExecutable.java:211)
>> > at
>> >
>> >
>> org.apache.kylin.job.execution.DefaultChainedExecutable.onExecuteFinished(DefaultChainedExecutable.java:101)
>> > at
>> >
>> org.apache.kylin.job.cube.CubingJob.onExecuteFinished(CubingJob.java:130)
>> > at
>> >
>> >
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>> > at
>> >
>> >
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> > I think it is a bug and create a jira ticket is here :
>> > https://issues.apache.org/jira/browse/KYLIN-1106
>> >
>>
>
>