from:"hongbin ma"

Re: [VOTE] Release apache-kylin-4.0.4 (RC1)

2024-01-22 Thread hongbin ma

+1 binding

On Mon, Jan 22, 2024 at 1:45 PM chuxiao  wrote:

> +1 (binding)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2024-01-21 00:12:42, "Li Yang"  wrote:
> >Hi all,
> >
> >I have created a build for Apache Kylin 4.0.4, release candidate 1. This
> is
> >a very small release aiming to upgrade the versions of dependent
> components
> >mainly.
> >
> >Changes highlights:
> >
> >   - Bump commons-fileupload from 1.3.3 to 1.5
> >   - Bump tomcat-catalina from 8.5.78 to 8.5.86
> >   - Bump spring-core from 5.2.22.RELEASE to 5.2.23.RELEASE
> >   - Bump scala minor version from 2.12.10 to 2.12.13
> >   - And a few other bug fixes
> >
> >Thanks to everyone who has contributed to this release.
> >
> >Apart from the above changes, there are no new features or improvements in
> >this proposed release.
> >
> >The commit to being voted upon:
> >
> https://github.com/apache/kylin/commit/37f63b8c22a557bb7f17df370aae9cf2ae640a18
> >
> >The artifacts to be voted on, including the source package and the binary
> >packages are located here:
> >https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-4.0.4-rc1/
> >
> >The hash of the artifacts are as the following
> >- apache-kylin-4.0.4-source-release.zip.sha256:
> >21b338aae14a71357650b35f473381cf325a9781adf1f6c9954cae9b4027cfe2
> >- apache-kylin-4.0.4-bin-spark3.tar.gz.sha256:
> >77abb6a1174dd7dd63c747c95cbb1da9838c48c0c9dc4c7c35e36933ebb2636e
> >
> >A staged Maven repository is available for review at:
> >https://repository.apache.org/content/repositories/orgapachekylin-1113
> >
> >Release artifacts are signed with my key:
> >- Fingerprint: CF48 8F24 2BBC 3A88 5DB7  C6DF 685F 5B5D D254 DE89
> >- Public source 1: https://people.apache.org/keys/committer/liyang.asc
> >- Public source 2:
> >
> https://keys.openpgp.org/vks/v1/by-fingerprint/CF488F242BBC3A885DB7C6DF685F5B5DD254DE89
> >
> >Please vote on releasing this package as Apache Kylin 4.0.4.
> >
> >The vote is open for the next 72 hours and passes if a majority of at
> least
> >three +1 PMC votes are cast.
> >
> >
> >[ ] +1 Release this package as Apache Kylin 3.0.2
> >[ ] 0 I don't feel strongly about it, but I'm okay with the release
> >[ ] -1 Do not release this package because...
> >
> >Here is my vote:
> >+1 (binding)
> >
> >Best regards,
> >Li Yang
>


-- 
Regards,
Hongbin Ma

[jira] [Created] (KYLIN-3379) timestampadd test coverage is not enough

2018-05-12 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3379:
-

 Summary: timestampadd test coverage is not enough
 Key: KYLIN-3379
 URL: https://issues.apache.org/jira/browse/KYLIN-3379
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.3.1
Reporter: hongbin ma






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [Discuss] Apache Kylin Component Owner Plan

2018-02-01 Thread hongbin ma

+1

suggest to establish more solid rules on test coverage. I've seen lots
patches without proper tests

On Fri, Feb 2, 2018 at 2:57 PM, Jianhua Peng  wrote:

> +1
>
> On 2018/02/02 02:39:38, ShaoFeng Shi  wrote:
> > Hello, Kylin community,
> >
> > In the past, we don't have a clear rule on Kylin each component's
> > ownership, which caused many external patches be pending there as no
> > reviewer to pick up.
> >
> > Now we plan to make the process and responsibility more clear. The main
> > idea is to identify the owners of each Apache Kylin component.
> >
> > - Component owners will be listed in the description field on this Apache
> > Kylin JIRA components page [1]. The owners are listed in the
> 'Description'
> > field rather than in the 'Component Lead' field because the latter only
> > allows us to list one individual whereas it is encouraged that components
> > have multiple owners.
> >
> > - Component owners are volunteers who are expert in their component
> domain
> > and may have an agenda on how they think their Apache Kylin component
> > should evolve. The owner needs to be an Apache Kylin committer at this
> > moment.
> >
> > - Owners will try and review patches that land within their component’s
> > scope.
> >
> > - Owners can rotate, based on his aspiration.
> >
> > - When nominate or vote a new committer, the nominator needs to state
> which
> > component the candidate can be the owner.
> >
> > - If you're already an Apache Kylin committer and would like to be a
> > volunteer as a component owner, just write to the dev list and we’ll sign
> > you up.
> >
> > - If you think the component list need be updated (add, remove, rename,
> > etc), write to the dev list and we’ll review that.
> >
> > Below is the component list with old component lead, which assumes to be
> > updated soon.
> >
> > [1]
> > https://issues.apache.org/jira/projects/KYLIN?
> selectedItem=com.atlassian.jira.jira-projects-plugin:components-page
> >
> > Please comment on this plan; if no objection, we will run it for some
> time
> > to see the effect. Thanks for your inputs!
> >
> > And, thanks to Apache HBase community, from where I learned this.
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

2018-01-03 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3149:
-

 Summary: Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not 
working as expected
 Key: KYLIN-3149
 URL: https://issues.apache.org/jira/browse/KYLIN-3149
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.2.0
Reporter: hongbin ma


for queries like:

{code:sql}
select TRANS_ID from kylin_sales group by cast (case 
WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, 
''), 1, 1)
when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, 
''), 1, 1)  
when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, 
''), 1, 1)
when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, 
''), 1, 1)
end as varchar(256)), TRANS_ID;
{code}

the expected logical plan after volcano is:

{code}
EXECUTION PLAN BEFORE REWRITE
OLAPToEnumerableConverter
  OLAPProjectRel(TRANS_ID=[$1], ctx=[])
OLAPLimitRel(ctx=[], fetch=[5])
  OLAPAggregateRel(group=[{0, 1}], ctx=[])
OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, 
''), 1, 1)], TRANS_ID=[$0], ctx=[])
  OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
{code}

however the actual is:

{code}
EXECUTION PLAN BEFORE REWRITE
OLAPToEnumerableConverter
  OLAPLimitRel(ctx=[], fetch=[5])
OLAPProjectRel(TRANS_ID=[$1], ctx=[])
  OLAPAggregateRel(group=[{0, 1}], ctx=[])
OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), 
SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', 
'1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), 
=('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 
1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
"UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
  OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
{code}

looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as 
expected. If we dump the internal state of this VolcanoPlanner 
(org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the 
complete dump is attached:

{code}
rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561

rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
 '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), 
=('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 
1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
''), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT 
NULL($2), $2, ''), 1, 1), null)):VARCHAR(256) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, 
cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}

rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
 NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, 
cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
{code}

we see two rels with same cost:  #339 and #348, where #339 is created from 
LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from 
LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject 
=(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule require 
Logical Project rather than OLAP Project, #339 is never reduced.

The worse thing is that cost of #339 and #348 are same. By current volcano 
planner algorithm  the first met rel will be chosen, so unexpected rel is chosen

A simple approach to fix this is to refine the rel choosing algorithm: when two 
rels are equal in cost, choose a "simpler" one. Since we don't have a perfect 
measurement of "simple", we simply choose the rel with smaller toString() length



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-3106) DefaultScheduler#shutdown should use shutdownNow instead of shutdown

2017-12-13 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-3106:
-

 Summary: DefaultScheduler#shutdown should use shutdownNow instead 
of shutdown
 Key: KYLIN-3106
 URL: https://issues.apache.org/jira/browse/KYLIN-3106
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma


java.util.concurrent.ExecutorService#shutdownNow will interrupt running worker 
threads, while java.util.concurrent.ExecutorService#shutdown will not.

if interrupt signal is sent, a worker thread can get aware of it and abort 
itself in time. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2985) Cache temp json file created by each Calcite Connection

2017-11-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2985:
-

 Summary: Cache temp json file created by each Calcite Connection
 Key: KYLIN-2985
 URL: https://issues.apache.org/jira/browse/KYLIN-2985
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Priority: Normal


In org.apache.kylin.query.schema.OLAPSchemaFactory, each caclite connection 
will hold a temp file in JVM. The total number of temp files could accumulate 
very large. A simple cache could address the problem



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2982) Avoid upgrade column in OLAPTable

2017-11-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2982:
-

 Summary: Avoid upgrade column in OLAPTable
 Key: KYLIN-2982
 URL: https://issues.apache.org/jira/browse/KYLIN-2982
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Priority: Normal


before CALCITE-845, to avoid sum(integer_typed_col) to overflow, we worked 
around by upgrading all integer columns (which appearing in sum measure ) to 
bigint type. The workaround will change the column's type without notifying 
users, and will easily lead to code mess. 

Now that CALCITE-845 is ready, we can use that to provide a cleaner impl



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2823) Trim TupleFilter after dictionary-based filter optimization

2017-08-30 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2823:
-

 Summary: Trim TupleFilter after dictionary-based filter 
optimization
 Key: KYLIN-2823
 URL: https://issues.apache.org/jira/browse/KYLIN-2823
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


with cube's dictionary, kylin will optimize filters like:

( a = 'value_in_dict' OR a = 'value_not_in_dict')   =>  (a = 
'value_in_dict' OR ConstantTupleFilter.FALSE)

we need to further trim the filter to (a = 'value_in_dict') to avoid too many 
children after flatten filter step




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2782) Replace DailyRollingFileAppender with RollingFileAppender to allow log retention

2017-08-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2782:
-

 Summary: Replace DailyRollingFileAppender with RollingFileAppender 
to allow log retention
 Key: KYLIN-2782
 URL: https://issues.apache.org/jira/browse/KYLIN-2782
 Project: Kylin
  Issue Type: Task
Reporter: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[DISCUSS] On removing Cube ACL

2017-08-06 Thread hongbin ma

Current ACL design is based on very early versions where kylin merely had
simple concepts like projects and cubes.
With the advent of Kylin v2.0, and several new concepts like Data Model (
http://kylin.apache.org/docs/gettingstarted/concepts.html) and Query
Pushdown (KYLIN-2515), the original cube-centric ACL design is becoming
outdated. The reasons are two-fold: 1. cubes are no longer the only
entities we want to take control within each projects.  2. Cube-level ACL
cannot protect underlying tables from being queries by unwanted users.

In fact, cubes is merely a special kind of index on the original table.
It's not straightforward to apply ACL on indexes rather than original
tables. That said, we need table-level ACL instead of cube-level ACL. We
have elaborated the detailed plans in KYLIN-2760 and KYLIN-2761. Please
comment on those issues or reply this email if you have any concerns.

-- 
Regards,

*Bin Mahone | 马洪宾*

Re: sql语句中当in的列表超过一定数据量的时候性能突然变差，如何解决？

2017-08-04 Thread hongbin ma

take a look at kylin.storage.hbase.max-fuzzykey-scan
in org.apache.kylin.common.KylinConfigBase#getQueryScanFuzzyKeyMax

2017-08-03 20:25 GMT+08:00 ShaoFeng Shi :

> It might be related with storage layer cache. You can make more tests to
> see the differences. Besides, if you can provide more logs when executing
> these two queries, that would be good for analysis.
>
> 在 2017年8月3日 下午4:26，wangzy24 写道：
>
> > 如下两个sql，差别是第二个hotel_code in的列表少一个元素（如红框标注），但性能差别很大:
> >  > AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162007.png>
> >  > AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162100.png>
> >
> > 个人觉得是in的元素过多，转换成rowkey比较多，导致kylin直接对表进行全部扫描，如果想不管rowkey多少，
> > 始终想通过rowkey进行查询该如何做？
> >
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/sql-in-tp8630.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: [jira] [Updated] (KYLIN-2669) Cube is ready but insight tables not result

2017-06-17 Thread hongbin ma

can you try latest kylin 2.0?

On Wed, Jun 14, 2017 at 12:23 PM, YUNFEI CHEN (JIRA) 
wrote:

>
>  [ https://issues.apache.org/jira/browse/KYLIN-2669?page=
> com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> YUNFEI CHEN updated KYLIN-2669:
> ---
> Description:
> As reported by Cheney( cyf_2...@sina.com)
> Hi All,
> I use kylin v1.6.0. add Cube and bulid successful.Cube status is Ready.
> but in Insight tabs, tables display no result.It happened sometime,not all
> the time.
> has anyone encountered a similar situation, how to resovle it?
> Best Regards
> Cheney Chen
>
>   was:
> As reported by Cheney( cyf_2...@163.com)
> Hi All,
> I use kylin v1.6.0. add Cube and bulid successful.Cube status is Ready.
> but in Insight tabs, tables display no result.It happened sometime,not all
> the time.
> has anyone encountered a similar situation, how to resovle it?
> Best Regards
> Cheney Chen
>
>
> > Cube is ready but insight tables not result
> > ---
> >
> > Key: KYLIN-2669
> > URL: https://issues.apache.org/jira/browse/KYLIN-2669
> > Project: Kylin
> >  Issue Type: Bug
> >  Components: Query Engine
> >Affects Versions: v1.6.0
> >Reporter: YUNFEI CHEN
> >Assignee: liyang
> > Attachments: logs.tar.gz
> >
> >
> > As reported by Cheney( cyf_2...@sina.com)
> > Hi All,
> > I use kylin v1.6.0. add Cube and bulid successful.Cube status is Ready.
> but in Insight tabs, tables display no result.It happened sometime,not all
> the time.
> > has anyone encountered a similar situation, how to resovle it?
> > Best Regards
> > Cheney Chen
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2671) Speed up prepared query execution

2017-06-15 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2671:
-

 Summary: Speed up prepared query execution
 Key: KYLIN-2671
 URL: https://issues.apache.org/jira/browse/KYLIN-2671
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


BI tools use prepared query for function probing, kylin should not execute such 
queries in standard way because it is too costly.

It's still worth mentioning standard "prepare-bindparameter-execute" way of 
PreparedStatement is still not supported. By now kylin only support Prepared 
Statements WITHOUT parameters.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KYLIN-2667) Ignore whitespace when caching query

2017-06-11 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2667:
-

 Summary: Ignore whitespace when caching query
 Key: KYLIN-2667
 URL: https://issues.apache.org/jira/browse/KYLIN-2667
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2659) Refactor KylinConfig so that all the default configurations are hidden in kylin-defaults.properties

2017-06-06 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2659:
-

 Summary: Refactor KylinConfig so that all the default 
configurations are hidden in kylin-defaults.properties
 Key: KYLIN-2659
 URL: https://issues.apache.org/jira/browse/KYLIN-2659
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Currently we ship a conf/kylin.properties file with a lot of configuration 
overrides. This is not a standard approach compared with other projects like 
hadoop or spark.

It's better to have a kylin-defaults.properties file to hide all the default 
configurations, users will only have to override necessary configurations in a 
blank kylin.properties.

After the refactor, a config might be override by the following precedence:

1. KV in kylin.properties.override, which is more of a "secret feature", never 
documented.
2. KV in kylin.properties, users are suggested to override configs here
3. KV in kylin-defaults.properties, readonly to users
4. KV in KylinConfigBase, readonly to users

The refactor will be backward compatible



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2646) Project level query authorization

2017-05-25 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2646:
-

 Summary: Project level query authorization
 Key: KYLIN-2646
 URL: https://issues.apache.org/jira/browse/KYLIN-2646
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


As we introduced ad-hoc queries in 
https://issues.apache.org/jira/browse/KYLIN-2515, we'll need to adjust query 
authorization as follows:

 Query authorization is encouraged to be set as project level. If someone is 
assigned READ permission on project, then he has access to query all tables in 
the project, regardless thru adhoc or cubes

 If a user has READ permission on cubes but no READ permission on project. He 
can only issue queries only if the query can be satisfied by those cubes he has 
READ permission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2636) optimize case when in group by

2017-05-22 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2636:
-

 Summary: optimize case when in group by 
 Key: KYLIN-2636
 URL: https://issues.apache.org/jira/browse/KYLIN-2636
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Similar to KYLIN-2635, for clauses like:

{code}
group by case when 1 = 1 then x 1 = 2 then y else z 
{code}

kylin only need to pick up x as grouping by column.

Again, like KYLIN-2635, we'll fix it in KYLIN rather than calcite first



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2631) Seek to next model when no cube in current model satisfies query

2017-05-19 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2631:
-

 Summary: Seek to next model when no cube in current model 
satisfies query
 Key: KYLIN-2631
 URL: https://issues.apache.org/jira/browse/KYLIN-2631
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


ModelChooser is introduced in 2.0 to match JoinTree in query with JoinTree in 
model. 

Currently, we first use ModelChooser to decide the model, then choose cube from 
the selected model. The cubes in other models are never considered. Chances are 
there when selected model cannot provide capable cube while non-selected model 
can. So it's still necessary go through all models



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2630) NPE when a subquery joins another lookup tables

2017-05-18 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2630:
-

 Summary: NPE when a subquery joins another lookup tables
 Key: KYLIN-2630
 URL: https://issues.apache.org/jira/browse/KYLIN-2630
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


{code:sql}

SELECT t1.cal_dt, t1.sum_price,t1.lstg_site_id 
FROM (
  select cal_dt, lstg_site_id, sum(price) as sum_price
  from test_kylin_fact
  group by cal_dt, lstg_site_id
  
) t1

inner JOIN edw.test_cal_dt as test_cal_dt
on t1.cal_dt=test_cal_dt.cal_dt

inner JOIN edw.test_sites as test_sites
on t1.lstg_site_id = test_sites.site_id


{code}

throws NPE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2625) not null filter clause should be evaluable in storage

2017-05-16 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2625:
-

 Summary: not null filter clause should be evaluable in storage
 Key: KYLIN-2625
 URL: https://issues.apache.org/jira/browse/KYLIN-2625
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


currently, limit push down is not enabled for queries like 

{code:sql}
select * from (
select * from test_kylin_fact
  where lstg_format_name is not null
  ) limit 20
 
{code}

because "not null" is treated as un-evaluateable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2610) Optimize BuiltInFunctionTransformer performance

2017-05-11 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2610:
-

 Summary: Optimize BuiltInFunctionTransformer performance
 Key: KYLIN-2610
 URL: https://issues.apache.org/jira/browse/KYLIN-2610
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


when a dictionary contains millions of entries, BuiltInFunctionTransformer may 
become slow. Need to optimize some critical paths, e.g. Regex matching for like 
clause



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2608) Bubble sort bug in JoinDesc

2017-05-11 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2608:
-

 Summary: Bubble sort bug in JoinDesc
 Key: KYLIN-2608
 URL: https://issues.apache.org/jira/browse/KYLIN-2608
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


{code}
 int n = foreignKey.length;
 for (int i = 0; i < n - 1 && cont; i++) {
 cont = false;
-for (int j = i; j < n - 1; j++) {
+for (int j = 0; j < n - 1 - i; j++) {
 int jj = j + 1;
 if (foreignKey[j].compareTo(foreignKey[jj]) > 0) {
 swap(foreignKey, j, jj);

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2599) select * in subquery fail due to bug in hackSelectStar

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2599:
-

 Summary: select * in subquery fail due to bug in hackSelectStar 
 Key: KYLIN-2599
 URL: https://issues.apache.org/jira/browse/KYLIN-2599
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma


{code:sql}

select fact.lstg_format_name from 
 
 (select * from test_kylin_fact where cal_dt > date'2010-01-01' ) as fact
 
 group by fact.lstg_format_name 
 
 order by CASE WHEN fact.lstg_format_name IS NULL THEN 'sdf' ELSE 
fact.lstg_format_name END 
 
{code}

will generate logical plan like:

{code}
LogicalSort(sort0=[$1], dir0=[ASC])
  LogicalProject(LSTG_FORMAT_NAME=[$0], EXPR$1=[CASE(IS NULL($0), 'sdf', $0)])
LogicalAggregate(group=[{0}])
  LogicalProject(LSTG_FORMAT_NAME=[$3])
LogicalProject(TRANS_ID=[$0], ORDER_ID=[$1], CAL_DT=[$2], 
LSTG_FORMAT_NAME=[$3], LEAF_CATEG_ID=[$4], LSTG_SITE_ID=[$5], 
SLR_SEGMENT_CD=[$6], SELLER_ID=[$7], PRICE=[$8], ITEM_COUNT=[$9], 
TEST_COUNT_DISTINCT_BITMAP=[$10], DEAL_AMOUNT=[$11], DEAL_YEAR=[$12], 
_KY_COUNT__=[$13], _KY_MIN_TEST_KYLIN_FACT_PRICE_=[$14], 
_KY_MAX_TEST_KYLIN_FACT_PRICE_=[$15], 
_KY_COUNT_DISTINCT_TEST_KYLIN_FACT_SELLER_ID_=[$16], 
_KY_COUNT_DISTINCT_TEST_KYLIN_FACT_LSTG_FORMAT_NAME_TEST_KYLIN_FACT_SELLER_ID_=[$17],
 _KY_COUNT_DISTINCT_TEST_KYLIN_FACT_TEST_COUNT_DISTINCT_BITMAP_=[$18], 
_KY_PERCENTILE_TEST_KYLIN_FACT_PRICE_=[$19])
  LogicalFilter(condition=[>($2, 2010-01-01)])
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

{code}

org.apache.calcite.sql2rel.SqlToRelConverter#hackSelectStar will by mistake 
treat it like a normal case and lead to throwing exception





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2598) Should not translate filter to a in-clause filter with too many elements

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2598:
-

 Summary: Should not translate filter to a in-clause filter with 
too many elements
 Key: KYLIN-2598
 URL: https://issues.apache.org/jira/browse/KYLIN-2598
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


In 
org.apache.kylin.dict.BuiltInFunctionTransformer#translateFunctionTupleFilter 
we will translate builtin-functions like upper,lower,like to in-clause filters.
 (KYLIN-993)

The approach is In-clause filter will soon become in-efficient when too many 
elements accumulate in the in-clause. Suggest to set a threshold so that when 
there're more elements than this threshold, the translation will abort



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2597) Deal with trivial expression in filters like x = 1 + 2

2017-05-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2597:
-

 Summary: Deal with trivial expression in filters like x = 1 + 2
 Key: KYLIN-2597
 URL: https://issues.apache.org/jira/browse/KYLIN-2597
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


BI tools will generate trivial expression in filters, e.g "x = 1 + 2". Such 
expressions will cause kylin to conceive it as "non-evaluateble", which in turn 
blocks other things like limit push down, or having to choose cuboid with more 
dimensions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2586) use random port for CacheServiceTest as fixed port 7777 might have been occupied

2017-05-04 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2586:
-

 Summary: use random port for CacheServiceTest as fixed port  
might have been occupied
 Key: KYLIN-2586
 URL: https://issues.apache.org/jira/browse/KYLIN-2586
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


https://builds.apache.org/job/Kylin-Master-JDK-1.7/442/

2017-05-04 02:24:45,913 WARN  [main AbstractLifeCycle:212]: FAILED 
ServerConnector@29065a9f{HTTP/1.1}{0.0.0.0:}: java.net.BindException: 
Address already in use
java.net.BindException: Address already in use



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2580) Improvement on subqueries: allow grouping by columns from subquery

2017-05-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2580:
-

 Summary: Improvement on subqueries: allow grouping by columns from 
subquery
 Key: KYLIN-2580
 URL: https://issues.apache.org/jira/browse/KYLIN-2580
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


{code:sql}

select test_kylin_fact.lstg_format_name, xxx.week_beg_dt , 
sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT 
 from  

 test_kylin_fact

 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 


 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 


 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name, xxx.week_beg_dt 
{code}

will fail due to groupby  xxx.week_beg_dt,  because week_beg_dt does not 
necessarily appear in the cube



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2579) Improvement on subqueries: reroder subqueries joins with RelOptRule

2017-05-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2579:
-

 Summary: Improvement on subqueries: reroder subqueries joins with 
RelOptRule
 Key: KYLIN-2579
 URL: https://issues.apache.org/jira/browse/KYLIN-2579
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


Current support for subqueries has some limitations. for example, we require  
JOIN on tables precedes JOIN on all subqueries, the following query:

{code:sql}

select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT
 from  

 test_kylin_fact

 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 


 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 
 
 
 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name

{code}

works but 

{code:sql}

select test_kylin_fact.lstg_format_name,sum(test_kylin_fact.price) as GMV 
 , count(*) as TRANS_CNT
 from  

 test_kylin_fact

 inner JOIN (select cal_dt,week_beg_dt from edw.test_cal_dt  where week_beg_dt 
>= DATE '2010-02-10'  ) xxx
 ON test_kylin_fact.cal_dt = xxx.cal_dt 
 
 inner JOIN test_category_groupings
 ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND 
test_kylin_fact.lstg_site_id = test_category_groupings.site_id 
 
 
 where test_category_groupings.meta_categ_name  <> 'Baby'
 group by test_kylin_fact.lstg_format_name

{code}

won't work. In this JIRA we'll reroder subqueries joins with RelOptRule



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [VOTE] Release apache-kylin-2.0.0 (RC3)

2017-04-28 Thread hongbin ma

+1

mvn test passed
mvn integration test passed

On Thu, Apr 27, 2017 at 8:38 AM, 《秦殇》！健  wrote:

> +1
> It's great!!!
> -- 原始邮件 --
> 发件人: "Li Yang";;
> 发送时间: 2017年4月27日(星期四) 上午7:22
> 收件人: "dev";
>
> 主题: [VOTE] Release apache-kylin-2.0.0 (RC3)
>
>
>
> Hi all,
>
> I have created a build for Apache Kylin 2.0.0, release candidate 3.
>
> Changes highlights:
>
> Support snowflake data model (KYLIN-1875)
> Support TPC-H queries (KYLIN-2467)
> Spark cubing engine (KYLIN-2331)
> Job engine HA (KYLIN-2006)
> Percentile measure (KYLIN-2396)
> Cloud tested (KYLIN-2351)
>
>
> Thanks to everyone who has contributed to this release. Here is release
> notes:
> http://kylin.apache.org/docs20/release_notes.html
>
> The commit to be voted upon (375fd807c281d8c5deff0620747c806be2019782):
> https://github.com/apache/kylin/tree/kylin-2.0.0
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.0.0-rc3/
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1041/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/liyang.asc
>
> Please vote on releasing this package as Apache Kylin 2.0.0.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PPMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.0.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
>
> Cheers
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2551) separate table desc by each project

2017-04-17 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2551:
-

 Summary: separate table desc by each project
 Key: KYLIN-2551
 URL: https://issues.apache.org/jira/browse/KYLIN-2551
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


for some historical reasons different projects share same table desc. This 
makes project admins having to worry about not to affect cubes in other project.

The jira aims to separate table desc by each project, and maintain backward 
compatibility so that users won't have to manually "upgrade"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] fork Calcite and include the fork as a a submodule

2017-04-17 Thread hongbin ma

Hi Owen

First of all, even if with the fork, we're still "working with calcite" by
constantly syncing new calcite codes(upon each calcite release) and
contributing back to calcite (PR from our fork to calcite). *We're not
saying goodbye to calcite.*

Let's forget about the dirty hacks. The key issue here should be un-synced
release cycles between kylin and calcite. Very often we're blocked at some
minor calcite issues, and we can fix them by small patches of calcite.
However the fact that we have to wait for months for a new calcite release
may trouble kylin users.



On Mon, Apr 17, 2017 at 3:05 PM, Owen O'Malley <owen.omal...@gmail.com>
wrote:

> Please work with the calcite project rather than forking. They are good
> guys and should be able to help. In terms of the hacks, can you figure out
> hooks in calcite that would let you accomplish there same goal?
>
> .. Owen
>
> > On Apr 17, 2017, at 06:50, Li Yang <liy...@apache.org> wrote:
> >
> > We should contribute all back to calcite except for those dirty hacks.
> The
> > question remains is how to sync the release cycles between kylin and
> > calcite. How to handle those patches that are important to kylin but not
> so
> > urgent to calcite. Having a fork of calcite obviously is a solution. But
> I
> > too don't know whether it is common and appropriate in the open source
> > world.
> >
> > Yang
> >
> >> On Sun, Apr 16, 2017 at 10:39 PM, hongbin ma <mahong...@apache.org>
> wrote:
> >>
> >> Recently I'm testing kylin connectivity with multiple BI tools like
> >> Tableau, Cognos, etc. During the test I find it necessary to fix several
> >> Calcite issues, like CALCITE-1754. I'm more than willing to contribute
> the
> >> fixes back to calcite, however there're still two potential issues:
> >>
> >> 1. Calcite has it's own release cycles, sometimes we cannot afford to
> wait
> >> for calcite's next release
> >> 2. Some dirty hacks (yet still necessary) is not likely to be accepted
> by
> >> Calcite. Currently there's a weird sub-project called "AtopCalcite" in
> >> Kylin to host all the dirty hacks.
> >>
> >> With the above two issues, I'm wondering what is the best way to
> interact
> >> with Calcite releases. I'm suggesting that:
> >>
> >> 1. We fork Apache Calcite and call it sth like calcite-for-kylin
> >> 2. Upon each calcite fix from our side, we double-commit to both Apache
> >> Calcite and calcite-for-kylin
> >> 3. For dirty hacks we only push code to calcite-for-kylin
> >> 4. calcite-for-kylin should be updated upon each Apache Calcite release
> >>
> >> Any comment are welcomed!
> >> @Julian Looking forward to your comments as well
> >>
> >> --
> >> Regards,
> >>
> >> *Bin Mahone | 马洪宾*
> >>
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[DISCUSS] fork Calcite and include the fork as a a submodule

2017-04-16 Thread hongbin ma

Recently I'm testing kylin connectivity with multiple BI tools like
Tableau, Cognos, etc. During the test I find it necessary to fix several
Calcite issues, like CALCITE-1754. I'm more than willing to contribute the
fixes back to calcite, however there're still two potential issues:

1. Calcite has it's own release cycles, sometimes we cannot afford to wait
for calcite's next release
2. Some dirty hacks (yet still necessary) is not likely to be accepted by
Calcite. Currently there's a weird sub-project called "AtopCalcite" in
Kylin to host all the dirty hacks.

With the above two issues, I'm wondering what is the best way to interact
with Calcite releases. I'm suggesting that:

1. We fork Apache Calcite and call it sth like calcite-for-kylin
2. Upon each calcite fix from our side, we double-commit to both Apache
Calcite and calcite-for-kylin
3. For dirty hacks we only push code to calcite-for-kylin
4. calcite-for-kylin should be updated upon each Apache Calcite release

Any comment are welcomed!
@Julian Looking forward to your comments as well

-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Kylin-over-Alluxio

2017-04-14 Thread hongbin ma

besides, in https://github.com/Alluxio/alluxio/pull/4426/files TeddyBear1314
says he tried and failed, can you please address his concern first? thank
you very much!

On Sat, Apr 15, 2017 at 11:01 AM, George Ni (Chunen Ni) <
chunen...@kyligence.io> wrote:

> Hi Huangzhi,
>
> Below are links of “how to doc” and “hot to contribute”
> http://kylin.apache.org/development/howto_docs.html
>
> http://kylin.apache.org/development/howto_contribute.html
>
> Looking forward to your sharing.
>
> Best regards,
>
> Chun’en Ni(倪春恩)
> Mail: chunen...@kyligence.io 
> Shanghai Kyligence Information Technology Co., Ltd
> 上海市浦东新区亮秀路112号Y1座405室
>
>
> 在 2017/4/14 下午2:55，“Huangzhi” 写入:
>
> Hi,
>
> I am a contributor of Alluxio
> community[https://github.com/Alluxio/alluxio].My name is Huangzhi.
>
> Alluxio is a memory-centric distributed filesystem and provides a
> hdfs-compatible API which can be used as a Hadoop FileSytem.
>
> In the past weeks, we have been working the test of running Kylin on
> Alluxio.We just use Alluxio's hdfs-compatible API and to see if we can
> running Kylin over it.
>
> We have successfully run the example of Kylin doc:
>
> http://kylin.apache.org/docs20/tutorial/kylin_sample.html
>
> Here is our doc:
>
> https://github.com/Alluxio/alluxio/pull/4426/files
>
> What we want to know is whether we can add this doc into Kylin's doc
> and if
> we can, how to do it?
>
>
>
> Best regards,
>
> Huangzhi
>
>
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Question regarding topN measure on string column

2017-03-31 Thread hongbin ma

hi,

i believe it's not supported. besides, how do you define "order" on string?
I don't think it's a reasonable requirement

-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2528) refine job email notification to support starttls and customized port

2017-03-31 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2528:
-

 Summary: refine job email notification to support starttls and 
customized port
 Key: KYLIN-2528
 URL: https://issues.apache.org/jira/browse/KYLIN-2528
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2527) Speedup LookupStringTable, use HashMap instead of ConcurrentHashMap

2017-03-31 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2527:
-

 Summary:  Speedup LookupStringTable, use HashMap instead of 
ConcurrentHashMap
 Key: KYLIN-2527
 URL: https://issues.apache.org/jira/browse/KYLIN-2527
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


concurrent hash map here is a overkill, it should be faster to init a normal 
hash map. the next step might be to cache the lookupStringTable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2521) upgrade to calcite 1.12.0

2017-03-27 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2521:
-

 Summary: upgrade to calcite 1.12.0 
 Key: KYLIN-2521
 URL: https://issues.apache.org/jira/browse/KYLIN-2521
 Project: Kylin
  Issue Type: Task
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2495) query exception when integer column encoded as date/time encoding

2017-03-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2495:
-

 Summary: query exception when integer column encoded as date/time 
encoding 
 Key: KYLIN-2495
 URL: https://issues.apache.org/jira/browse/KYLIN-2495
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


in KYLIN-, we claimed that integer column can use date/time encoding. 
however when I tried to query on such cube, an exception is thrown:

{code}
java.sql.SQLException: Error while executing SQL "select * from fact0309
LIMIT 5": For input string: "70225920"
{code}

the fact table desc is: 

{code}
hive> desc fact0309
> ;
OK
tdate   int 
country string  
price   decimal(10,0) 
{code}

and the sample data is:

{code}
19980302US  100
19920403CN  100
19920403US  33
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2483) SortedIteratorMergerWithLimit could be slower when number of total merge rows is small

2017-03-05 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2483:
-

 Summary: SortedIteratorMergerWithLimit could be slower when number 
of total merge rows is small
 Key: KYLIN-2483
 URL: https://issues.apache.org/jira/browse/KYLIN-2483
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


if the pushed down limit is small enough (say less than 100), 
SortedIteratorMergerWithLimit will bring RELATIVELY significant costs. I'm 
adding a new configuration entry called 
kylin.query.merge-sort-partition-results.min-limit (default 100) to fix this 
issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (KYLIN-2441) protocol for REST API result format

2017-02-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2441:
-

 Summary: protocol for REST API result format
 Key: KYLIN-2441
 URL: https://issues.apache.org/jira/browse/KYLIN-2441
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Problem while querying Cube

2017-02-07 Thread hongbin ma

hi Shailesh,

from your cube desc:

 "rowkey": {
"rowkey_columns": [
  {
"column": "SUPPKEY",
"encoding": "fixed_length:8",
"isShardBy": false
  },
  {
"column": "ORDERKEY",
"encoding": "fixed_length:8",
"isShardBy": false
  },
  {
"column": "PARTKEY",
"encoding": "fixed_length:8",
"isShardBy": false
  }
]
  },


if any of the three columns "SUPPKEY", "ORDERKEY", or "PARTKEY" is of
numeric type, the exception is expected. The reason is illustrated in my
previous reply.

Try to avoid applying "fixed_length" for numeric types, in fact, in our
coming new release, "fixed_length" is only valid with string types.

On Wed, Feb 8, 2017 at 2:34 PM, Shailesh Prajapati <shail...@infoworks.io>
wrote:

> Hi HongBin,
>
> Cube for which i provided logs has been deleted, So attaching description
> gist of another Cube which is giving the same error,
>
> https://gist.github.com/shaipraj/fc930e8e39be82a23d910d6388439776
>
> On Mon, Feb 6, 2017 at 7:33 PM, hongbin ma <mahong...@apache.org> wrote:
>
> > hi Shailesh,
> >
> >
> > can you please attach the cube description? It can be found by click on
> the
> > cube and find "CUBE(json)".
> >
> > I guess you're using fixed_length encoding for integer type
> > column ORDER_ID, such setting is found to be buggy(
> > https://issues.apache.org/jira/browse/KYLIN-2179)
> >
> > On Sun, Feb 5, 2017 at 9:42 PM, Shailesh Prajapati <
> shail...@infoworks.io>
> > wrote:
> >
> > > What should i supposed to do now to make query work with limits?
> Because
> > i
> > > am keep getting this exception. What will happen if we remove this
> check?
> > >
> > > Thanks.
> > >
> > > On Sat, Feb 4, 2017 at 8:12 PM, ShaoFeng Shi <shaofeng...@apache.org>
> > > wrote:
> > >
> > > > I don't know what that check is necessary, as there is a todo which
> > says
> > > > will remove that someday:
> > > > https://github.com/apache/kylin/blob/master/core-
> > > > storage/src/main/java/org/apache/kylin/storage/gtrecord/
> > > > SortedIteratorMergerWithLimit.java#L127
> > > >
> > > > @Hongbin, any idea?
> > > >
> > > >
> > > > 2017-02-04 17:05 GMT+08:00 Shailesh Prajapati <shail...@infoworks.io
> >:
> > > >
> > > > > Hi ShaoFeng,
> > > > >
> > > > > Here is the gist link,
> > > > >
> > > > > https://gist.github.com/shaipraj/780a3dcc80aa2080911b7348c76f5b88
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Sat, Feb 4, 2017 at 2:24 PM, ShaoFeng Shi <
> shaofeng...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Shailesh, there is no attachement (attachement isn't supported
> > by
> > > > > > mailing list); can you paste the content directly or put it to
> > gist?
> > > > > >
> > > > > > 2017-02-04 15:49 GMT+08:00 Shailesh Prajapati <
> > shail...@infoworks.io
> > > >:
> > > > > >
> > > > > > > Hi ShaoFeng,
> > > > > > >
> > > > > > > I am attaching a portion of kylin's log.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > On Sat, Feb 4, 2017 at 12:59 PM, ShaoFeng Shi <
> > > > shaofeng...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Shailesh,
> > > > > > >>
> > > > > > >> Could you please provide the error trace? We need to know
> where
> > > the
> > > > > > error
> > > > > > >> got thrown. Thanks.
> > > > > > >>
> > > > > > >> 2017-02-03 18:06 GMT+08:00 Shailesh Prajapati <
> > > > shail...@infoworks.io
> > > > > >:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > We are running Kylin 1.6 and successfully build Cube on it.
> > >

[jira] [Created] (KYLIN-2435) two EXTRACT on a column will fail if there exists NULL values for the column

2017-02-07 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2435:
-

 Summary: two EXTRACT on a column will fail if there exists NULL 
values for the column
 Key: KYLIN-2435
 URL: https://issues.apache.org/jira/browse/KYLIN-2435
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


2000-01-01 19:12:33,US,android,10.22
2001-01-01 9:12:33,US,windows,9.12
2002-05-02 20:12:03,CN,windows,3.33
\N,CN,windows,3.32

create table testtable (starttime TIMESTAMP,country STRING, client STRING, 
price DECIMAL(18,4)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

the following query will succeed:

{code}
select sum(price),extract (year from starttime) from testtable group by extract 
(year from starttime)
{code}

but the following will fail:

{code}
select sum(price) from testtable group by extract (year from starttime), 
extract (month from starttime)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Problem while querying Cube

2017-02-06 Thread hongbin ma

hi Shailesh,


can you please attach the cube description? It can be found by click on the
cube and find "CUBE(json)".

I guess you're using fixed_length encoding for integer type
column ORDER_ID, such setting is found to be buggy(
https://issues.apache.org/jira/browse/KYLIN-2179)

On Sun, Feb 5, 2017 at 9:42 PM, Shailesh Prajapati 
wrote:

> What should i supposed to do now to make query work with limits? Because i
> am keep getting this exception. What will happen if we remove this check?
>
> Thanks.
>
> On Sat, Feb 4, 2017 at 8:12 PM, ShaoFeng Shi 
> wrote:
>
> > I don't know what that check is necessary, as there is a todo which says
> > will remove that someday:
> > https://github.com/apache/kylin/blob/master/core-
> > storage/src/main/java/org/apache/kylin/storage/gtrecord/
> > SortedIteratorMergerWithLimit.java#L127
> >
> > @Hongbin, any idea?
> >
> >
> > 2017-02-04 17:05 GMT+08:00 Shailesh Prajapati :
> >
> > > Hi ShaoFeng,
> > >
> > > Here is the gist link,
> > >
> > > https://gist.github.com/shaipraj/780a3dcc80aa2080911b7348c76f5b88
> > >
> > > Thanks.
> > >
> > > On Sat, Feb 4, 2017 at 2:24 PM, ShaoFeng Shi 
> > > wrote:
> > >
> > > > Hi Shailesh, there is no attachement (attachement isn't supported by
> > > > mailing list); can you paste the content directly or put it to gist?
> > > >
> > > > 2017-02-04 15:49 GMT+08:00 Shailesh Prajapati  >:
> > > >
> > > > > Hi ShaoFeng,
> > > > >
> > > > > I am attaching a portion of kylin's log.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Sat, Feb 4, 2017 at 12:59 PM, ShaoFeng Shi <
> > shaofeng...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hi Shailesh,
> > > > >>
> > > > >> Could you please provide the error trace? We need to know where
> the
> > > > error
> > > > >> got thrown. Thanks.
> > > > >>
> > > > >> 2017-02-03 18:06 GMT+08:00 Shailesh Prajapati <
> > shail...@infoworks.io
> > > >:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > We are running Kylin 1.6 and successfully build Cube on it.
> > > Aggregate
> > > > >> > queries are running fine But, with non aggregate query we are
> > > getting
> > > > >> > following exception,
> > > > >> >
> > > > >> > org.apache.kylin.rest.exception.InternalErrorException: Not
> > sorted!
> > > > >> last:
> > > > >> > CUSTOMER_ID=0,ORDER_ID=10345,QUANTITY=null ... and other
> columns.
> > > > >> >
> > > > >> > Query used: select ORDERS.STATUS from ORDER_DETAILS as ORDERS
> > limit
> > > 5;
> > > > >> >
> > > > >> > One more observation, with limit less than 5 even non aggregate
> > > > queries
> > > > >> are
> > > > >> > also working.
> > > > >> > Please help us resolving this issue. let us know for any other
> > > > >> information.
> > > > >> >
> > > > >> > Thanks
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best regards,
> > > > >>
> > > > >> Shaofeng Shi 史少锋
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Shailesh
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > > Shaofeng Shi 史少锋
> > > >
> > >
> > >
> > >
> > > --
> > > Shailesh
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>
>
>
> --
> Shailesh
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Problem while querying Cube

2017-02-06 Thread hongbin ma

are we mixing two separate issues in the same thread here?

On Mon, Feb 6, 2017 at 9:26 PM, ShaoFeng Shi  wrote:

> NoSuchObjectException(message:default.kylin_intermediate_
> kylin_sales_cube_desc_3a41df7d_93e1_4445_9
> usually this error indicates that hive-site.xml wasn't on classpath; a
> quick workaround is copying that file to $KYLIN_HOME/conf; the ultimate
> solution is checking why bin/find-hive-dependency.sh didn't find it
> automatically.
> Get Outlook for iOS
>
>
>
>
> On Mon, Feb 6, 2017 at 6:43 PM +0900, "磊 王" 
> wrote:
>
>
>
>
>
>
>
>
>
>
> I manually added core-site.xml under kylin conf, and then started Kylin.
> When I build the sample, it seems go over the wrong place yesterday, but
> errored at another place.
>
>
>
> cat core-site.xml
>
>
> fs.defaultFS
> hdfs://sandbox.hortonworks.com:8020
>
>
>
>
> 2017-02-06 09:34:53,357 INFO  [pool-8-thread-3]
> metastore.MetaStoreDirectSql:140 : Using direct SQL, underlying DB is
> DERBY
> 2017-02-06 09:34:53,359 INFO  [pool-8-thread-3] metastore.ObjectStore:273
> : Initialized ObjectStore
> 2017-02-06 09:34:53,590 INFO  [pool-8-thread-3]
> metastore.HiveMetaStore:664 : Added admin role in metastore
> 2017-02-06 09:34:53,592 INFO  [pool-8-thread-3]
> metastore.HiveMetaStore:673 : Added public role in metastore
> 2017-02-06 09:34:53,651 INFO  [pool-8-thread-3]
> metastore.HiveMetaStore:713 : No user is added in admin role, since config
> is empty
> 2017-02-06 09:34:53,770 INFO  [pool-8-thread-3]
> metastore.HiveMetaStore:747 : 0: get_databases:
> NonExistentDatabaseUsedForHealthCheck
> 2017-02-06 09:34:53,771 INFO  [pool-8-thread-3] HiveMetaStore.audit:372 :
> ugi=root  ip=unknown-ip-addr  cmd=get_databases:
> NonExistentDatabaseUsedForHealthCheck
> 2017-02-06 09:34:53,791 INFO  [pool-8-thread-3]
> metastore.HiveMetaStore:747 : 0: get_table : db=default
> tbl=kylin_intermediate_kylin_sales_cube_desc_3a41df7d_93e1_
> 4445_9ca0_882f5f6e9d10
> 2017-02-06 09:34:53,791 INFO  [pool-8-thread-3] HiveMetaStore.audit:372 :
> ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default
> tbl=kylin_intermediate_kylin_sales_cube_desc_3a41df7d_93e1_
> 4445_9ca0_882f5f6e9d10
> 2017-02-06 09:34:53,812 INFO  [pool-8-thread-3]
> common.AbstractHadoopJob:506 : tempMetaFileString is : null
> 2017-02-06 09:34:53,814 ERROR [pool-8-thread-3]
> common.MapReduceExecutable:127 : error execute MapReduceExecutable{id=
> baa9531d-fc11-4a60-aa5e-a069d4bee3c2-02, name=Extract Fact Table Distinct
> Columns, state=RUNNING}
> java.lang.RuntimeException: java.io.IOException:
> NoSuchObjectException(message:default.kylin_intermediate_
> kylin_sales_cube_desc_3a41df7d_93e1_4445_9ca0_882f5f6e9d10 table not
> found)
> at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.
> configureJob(HiveMRInput.java:110)
> at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.
> setupMapper(FactDistinctColumnsJob.java:119)
> at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.
> run(FactDistinctColumnsJob.java:103)
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:92)
> at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:120)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:57)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: NoSuchObjectException(message:
> default.kylin_intermediate_kylin_sales_cube_desc_3a41df7d_93e1_4445_9ca0_882f5f6e9d10
> table not found)
> at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.
> setInput(HCatInputFormat.java:97)
> at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.
> setInput(HCatInputFormat.java:51)
> at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.
> configureJob(HiveMRInput.java:105)
> ... 11 more
> Caused by: NoSuchObjectException(message:default.kylin_intermediate_
> kylin_sales_cube_desc_3a41df7d_93e1_4445_9ca0_882f5f6e9d10 table not
> found)
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.get_table_core(HiveMetaStore.java:1806)
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.get_table(HiveMetaStore.java:1776)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)

Re: Proposal for updating master branch to use HBase 1.x

2017-01-22 Thread hongbin ma

+1

With more and more prod env upgrading to hbase 1.x, Kaige's proposal can
help kylin developer to upgrade their development environment( sandbox, qa
cluster, etc), so that the gap between dev env and prod env can be minimal.

A key message here is we're not stopping support for hbase 0.98. we're
merely changing the "default" hbase version. 


On Sun, Jan 22, 2017 at 12:06 PM, Kaige Liu  wrote:

> Hi Folks,
>
> Currently our master branch is based on HBase 0.98. As per discussion here<
> http://apache-kylin.74782.x6.nabble.com/DISCUSS-
> Call-the-next-release-v2-0-td6684.html#a6750>, I think it’s the time to
> change master branch to use HBase 1.x now.
> Below changes should be made:
>
> 1.   Delete master-hbase1.x branch and use master branch instead.
>
> 2.   merge master-cdh5.7 to master branch(KYLIN-2413 issues.apache.org/jira/browse/KYLIN-2413>), master-cdh5.7 should be
> deleted.
>
> 3.   new branch master-hbase0.98 will be created to continue
> supporting HBase 0.98.
>
> Users and developers will be impacted and should choose correct branch
> according to corresponding Hadoop distros.
>
> 1.   HBase 0.98 users should use master-hbase0.98 branch
>
> 2.   Others please use master branch
>
> Developers may need to upgrade your dev environments.
>
> Best regards,
>
> Kaige Liu(刘凯歌)
> Mail: kaige@kyligence.io
> Shanghai Kyligence Information Technology Co., Ltd
> 上海市浦东新区亮秀路112号Y1座405室
>
> "Do small things with great love."
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2302) push the value in statement.setMaxRows(10) to storage

2016-12-20 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2302:
-

 Summary: push the value in statement.setMaxRows(10) to storage
 Key: KYLIN-2302
 URL: https://issues.apache.org/jira/browse/KYLIN-2302
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


the solution in KYLIN-2236 was a quick workaround



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Call the next release v2.0?

2016-12-18 Thread hongbin ma

+1

On Mon, Dec 19, 2016 at 2:27 PM, roger shi 
wrote:

> +1
>
> On 19/12/2016, 2:23 PM, "Li Yang"  wrote:
>
> Guys,
>
> I'm thinking maybe it's time to call the next release v2.0. Like to
> hear
> your thoughts.
>
> Actually the current v1.6, for the streaming cubing capability, is
> already
> a good candidate of v2.0. However there were some other big changes
> ongoing
> and we decided to let v2.0 wait a bit.
>
> These big changes are:
>
> - KYLIN-1726: Streaming cubing we know.
> - KYLIN-1875: Snowflake support. Big metadata change which is not fully
> backward compatible.
> - KYLIN-2195: All Kylin properties renamed, to follow a convention.
> - KYLIN-2255: The old HBase storage (called v1 internally) is dropped.
> Cubes created by v1.3 and before are no longer supported.
>
> With all these changes on master, the next release deserves a plus on
> the
> major version.
>
> What do you think?
>
>
> Cheers
> Yang
>
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2292) workaround for CALCITE-1540

2016-12-16 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2292:
-

 Summary: workaround for CALCITE-1540
 Key: KYLIN-2292
 URL: https://issues.apache.org/jira/browse/KYLIN-2292
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


currently we're using calcite-1.8.0, however CALCITE-1540 is not even merged 
yet. Have to workaround for current KYLIN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2290) minor improvements on limit

2016-12-16 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2290:
-

 Summary: minor improvements on limit 
 Key: KYLIN-2290
 URL: https://issues.apache.org/jira/browse/KYLIN-2290
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


1. deprecate kylin.query.max-limit-pushdown because there's already storage 
scan threshold. Any limit is "good"
2. simply enable limit logic and other minor refactors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2272) limit push down should be disabled for "dimension distinct count"

2016-12-12 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2272:
-

 Summary: limit push down should be disabled for "dimension 
distinct count"
 Key: KYLIN-2272
 URL: https://issues.apache.org/jira/browse/KYLIN-2272
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Welcome new Apache Kylin committer: Billy Liu

2016-12-10 Thread hongbin ma

Welcome!

On Wed, Nov 30, 2016 at 11:56 AM, Li Yang  wrote:

> Welcome Billy~~~
>
> On Wed, Nov 30, 2016 at 11:13 AM, Yerui Sun  wrote:
>
> > Welcome aboard, Billy!
> >
> > > 在 2016年11月30日，10:52，2376579413 <2376579...@qq.com> 写道：
> > >
> > > Congratulations, Billy!
> > >
> > > Chris Zhao
> > >
> > > -邮件原件-
> > > 发件人: Billy(Yiming) Liu [mailto:liuyiming@gmail.com]
> > > 发送时间: Wednesday, November 30, 2016 10:06 AM
> > > 收件人: dev
> > > 主题: Re: Welcome new Apache Kylin committer: Billy Liu
> > >
> > > Thanks Luke for the invitation, and thanks the trust from PMC and all
> > > community.
> > >
> > > The first time I met Kylin was 2015. At that time I tried to build up a
> > DMP
> > > system based on Kylin.
> > >
> > > One year passed since then, I participate in the community, try to
> figure
> > > out how Kylin works, learn from real experts, and meet many good
> friends.
> > > Has so much fun here. I'm pleased, honored and humbled to accept the
> > > invitation and to be a part of Kylin community.
> > >
> > > Thank you, Luke and our community.
> > >
> > >
> > > 2016-11-30 9:54 GMT+08:00 Jian Zhong :
> > >
> > >> Welcome Billy!
> > >>
> > >> On Wed, Nov 30, 2016 at 9:38 AM, Dong Li  wrote:
> > >>
> > >>> Congrats! Welcome Billy!
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Dong Li
> > >>>
> > >>>
> > >>> Original Message
> > >>> Sender:ShaoFeng shishaofeng...@apache.org
> > >>> Recipient:dev...@kylin.apache.org
> > >>> Date:Wednesday, Nov 30, 2016 09:03
> > >>> Subject:Re: Welcome new Apache Kylin committer: Billy Liu
> > >>>
> > >>>
> > >>> Welcome Billy! 2016-11-30 8:59 GMT+08:00 Luke Han luke...@apache.org
> :
> > I
> > >>> am very pleased to announce that the Project Management Committee
> > (PMC)
> > >> of
> > >>> Apache Kylin has asked Billy (Yiming) Liu to becomeApache Kylin
> > >> committer,
> > >>> and he has already accepted.   Billy has already made many
> > contributions
> > >> to
> > >>> Kylin community, to answer  others questions actively, submit patches
> > for
> > >>> bug fixes and contribute to  some features. We are so glad to have
> him
> > to
> > >>> be our new committer.   Please join me to welcome Billy.   Luke Han
> >  On
> > >>> behalf of the Apache Kylin PPMC  -- Best regards, Shaofeng Shi 史少锋
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > With Warm regards
> > >
> > > Yiming Liu (刘一鸣)
> > >
> > >
> > >
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Use Spark Cube Engine

2016-12-10 Thread hongbin ma

And the bad news is the spark cubing engine is out of maintenance recently

On Sat, Dec 10, 2016 at 5:12 PM, hongbin ma <mahong...@apache.org> wrote:

> I'm afraid being experimental also means we don't have documentation for
> it, you'll have to dig into the source code a little bit
>
> On Tue, Dec 6, 2016 at 6:09 PM, Luke_Selina <huangzhendon...@gmail.com>
> wrote:
>
>> Though Spark is a experimental component, I still want to have a try,
>> please
>> tell me how, thank you!
>>
>> --
>> View this message in context: http://apache-kylin.74782.x6.n
>> abble.com/Use-Spark-Cube-Engine-tp6514p6517.html
>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Use Spark Cube Engine

2016-12-10 Thread hongbin ma

I'm afraid being experimental also means we don't have documentation for
it, you'll have to dig into the source code a little bit

On Tue, Dec 6, 2016 at 6:09 PM, Luke_Selina 
wrote:

> Though Spark is a experimental component, I still want to have a try,
> please
> tell me how, thank you!
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Use-Spark-Cube-Engine-tp6514p6517.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: select * clause still case all regionserver crash

2016-12-10 Thread hongbin ma

Hi, are you "qstar" from
https://issues.apache.org/jira/browse/KYLIN-1936?focusedCommentId=15727422=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15727422
? Just want to make sure whether it's a individual case or common case

On Thu, Dec 8, 2016 at 10:00 AM, alaleiwang  wrote:

> about v1.5.4,"select * from table limit N" clause seems not to crash
> regionserver,but meantime no result return for the clause
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/select-clause-still-cause-all-regionserver-
> crash-tp6474p6535.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

-- 
Regards,

*Bin Mahone | 马洪宾*

Re: kylin 1.6.0 cardinality can't greater than 5000000 ?

2016-12-10 Thread hongbin ma

You should first figure out the dimension really suits dimension encoding.
If a dimension has more than millions of cardinality, its built
dictionary's size will lose control. The bad news is that Kylin caches the
dictionary in query server's heap as well as in some of the MR mappers
head. It could cause potential performance issues.

Do you have sample data for this dimension? maybe you should think about
fixed_length encoding or integer encoding, rather than using dict encoding
for this specific dimension.

On Thu, Dec 8, 2016 at 10:20 PM, Alberto Ramón 
wrote:

> Humm, you can try this:
>
> With Kylin 1705  you can
> use Global dictionary Builder, which support 2 Billons of values (versus
> previous dic 5 Millons)
>
> In Teorical you can migrate from old dics (Kylin 1775
>  )
>
> 2016-12-08 7:57 GMT+01:00 wang...@snqu.com :
>
> > I improved the version from 1.5.4.1 to 1.6.0 and modified KYLIN_HOME,
> > and modied "kylin.dictionary.max.cardinality=500" to
> >  "kylin.dictionary.max.cardinality=3000" in file kylin.properties,
> > then start kylin 1.6-->create model-->create cube-->build cube
> >I got the following error message:
> >
> > java.lang.RuntimeException: Failed to create dictionary on
> > DEFAULT.TEST_500W_TBL.ROWKEY
> > at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> > DictionaryManager.java:325)
> > at org.apache.kylin.cube.CubeManager.buildDictionary(
> CubeManager.java:222)
> > at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> > DictionaryGeneratorCLI.java:50)
> > at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> > DictionaryGeneratorCLI.java:41)
> > at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> > CreateDictionaryJob.java:54)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> > doWork(HadoopShellExecutable.java:63)
> > at org.apache.kylin.job.execution.AbstractExecutable.
> > execute(AbstractExecutable.java:113)
> > at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> > DefaultChainedExecutable.java:57)
> > at org.apache.kylin.job.execution.AbstractExecutable.
> > execute(AbstractExecutable.java:113)
> > at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> > DefaultScheduler.java:136)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.IllegalArgumentException: Too high cardinality is
> > not suitable for dictionary -- cardinality: 5359970
> > at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> > DictionaryGenerator.java:96)
> >
> >
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Update default config for sandbox environment

2016-12-08 Thread hongbin ma

hi billy

that's  really good idea, how do you plan to approach this?

On Thu, Dec 8, 2016 at 6:30 PM, Billy Liu  wrote:

> Hi dev community,
>
> Most users deploy Kylin on their own sandbox for the first trial. Most
> sandbox has most 8G memory. The most used sandboxes are HDP sandbox and CDH
> sandbox. We'd better make the default kylin configuration convenient for
> these sandbox environment.  The suggestion includes reducing region-cut-gb,
> hfile-size-gb, max-region-count, reduce-input-mb,
> max-reducer-number,mapreduce.map.memory.mb,mapreduce.map.java.opts.
>
> What do you think?
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2246) redesign the way to decide layer cubing reducer count

2016-12-02 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2246:
-

 Summary: redesign the way to decide layer cubing reducer count
 Key: KYLIN-2246
 URL: https://issues.apache.org/jira/browse/KYLIN-2246
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


currently the sizing algorithm does not leverage CubeStatsReader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2240) Add a toggle to ignore all cube signature inconsistency temporally

2016-11-30 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2240:
-

 Summary: Add a toggle to ignore all cube signature inconsistency 
temporally
 Key: KYLIN-2240
 URL: https://issues.apache.org/jira/browse/KYLIN-2240
 Project: Kylin
  Issue Type: New Feature
Reporter: hongbin ma
Assignee: hongbin ma


cube signature helps to prevent ready cubes from being changed to broken state. 
However it could be annoying in some rare cases, for example POC sites.

The toggle should NEVER be used for serious PROD deployment!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2236) statement.setMaxRows(10) is not working

2016-11-30 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2236:
-

 Summary: statement.setMaxRows(10) is not working
 Key: KYLIN-2236
 URL: https://issues.apache.org/jira/browse/KYLIN-2236
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


some BI tools will use statement.setMaxRows(10) to function as limit clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release apache-kylin-1.6.0 (RC2)

2016-11-24 Thread hongbin ma

+1 (binding)


mvn test passed




-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2227) rename kylin-log4j.properties to kylin-tools-log4j.properties and move it to global conf folder

2016-11-23 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2227:
-

 Summary: rename kylin-log4j.properties to 
kylin-tools-log4j.properties and move it to global conf folder
 Key: KYLIN-2227
 URL: https://issues.apache.org/jira/browse/KYLIN-2227
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2222) web GUI should use the column types to valid encodings mapping provided by backend

2016-11-22 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-:
-

 Summary: web GUI should use the column types to valid encodings 
mapping provided by backend
 Key: KYLIN-
 URL: https://issues.apache.org/jira/browse/KYLIN-
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2221) rethink on KYLIN-1684

2016-11-21 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2221:
-

 Summary: rethink on KYLIN-1684
 Key: KYLIN-2221
 URL: https://issues.apache.org/jira/browse/KYLIN-2221
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2193) parameterise org.apache.kylin.storage.translate.DerivedFilterTranslator#IN_THRESHOLD

2016-11-15 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2193:
-

 Summary: parameterise 
org.apache.kylin.storage.translate.DerivedFilterTranslator#IN_THRESHOLD
 Key: KYLIN-2193
 URL: https://issues.apache.org/jira/browse/KYLIN-2193
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


currently 
org.apache.kylin.storage.translate.DerivedFilterTranslator#IN_THRESHOLD is hard 
coded to 5, which is too small in many scenarios. I'm proposing to increase the 
default value to 20, and add a new config entry to allow tuning this parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2181) remove integer as fixed_length in test_kylin_cube_with_slr_empty desc

2016-11-13 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2181:
-

 Summary: remove integer as fixed_length in 
test_kylin_cube_with_slr_empty desc
 Key: KYLIN-2181
 URL: https://issues.apache.org/jira/browse/KYLIN-2181
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2179) should disable limit push down if there exists fixed_lenth encoding for integers in the rowkey

2016-11-13 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2179:
-

 Summary: should disable limit push down if there exists 
fixed_lenth encoding for integers in the rowkey
 Key: KYLIN-2179
 URL: https://issues.apache.org/jira/browse/KYLIN-2179
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


because fixedlength order != integer natural order



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2175) cubestatsreader support reading unfinished segments

2016-11-10 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2175:
-

 Summary: cubestatsreader support reading unfinished segments
 Key: KYLIN-2175
 URL: https://issues.apache.org/jira/browse/KYLIN-2175
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


currently cubestatsreader only deal with READY segments, actually we have 
enough stats after the cubing job's first 2 or three steps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2168) changes on cube-level-config will cause cube inconsistency

2016-11-07 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2168:
-

 Summary: changes on cube-level-config will cause cube inconsistency
 Key: KYLIN-2168
 URL: https://issues.apache.org/jira/browse/KYLIN-2168
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


some configs will cause cube inconsistence, for example 
"kylin.cube.aggrgroup.max.combination" and "kylin.cube.aggrgroup.max.size" . 
The changes on such configurations should be reflected in the signature of 
cubedesc. The issue becomes more serious as we allowed configuration changes 
for READY cubes(KYLIN-2090)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2156) support filters like a != 0

2016-11-03 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2156:
-

 Summary: support filters like a != 0
 Key: KYLIN-2156
 URL: https://issues.apache.org/jira/browse/KYLIN-2156
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2152) TopN group by column does not distinguish between NULL and ""

2016-11-02 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2152:
-

 Summary: TopN group by column does not distinguish between NULL 
and ""
 Key: KYLIN-2152
 URL: https://issues.apache.org/jira/browse/KYLIN-2152
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: Shaofeng SHI






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2145) StorageCleanupJob will fail when beeline enabled

2016-11-01 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2145:
-

 Summary: StorageCleanupJob will fail when beeline enabled
 Key: KYLIN-2145
 URL: https://issues.apache.org/jira/browse/KYLIN-2145
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


due to beeline output format



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2134) Kylin will treat empty string as NULL by mistake

2016-10-27 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2134:
-

 Summary: Kylin will treat empty string as NULL by mistake
 Key: KYLIN-2134
 URL: https://issues.apache.org/jira/browse/KYLIN-2134
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2125) Support using beeline to load hive table metadata

2016-10-24 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2125:
-

 Summary: Support using beeline to load hive table metadata
 Key: KYLIN-2125
 URL: https://issues.apache.org/jira/browse/KYLIN-2125
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


in some cases hive CLI is not allowed to extract hive table metadata from hive 
metadata store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2071) automatically renew kerbose tgt

2016-10-08 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2071:
-

 Summary: automatically renew kerbose tgt 
 Key: KYLIN-2071
 URL: https://issues.apache.org/jira/browse/KYLIN-2071
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


currently we rely on crontab to periodically renew kerbose tgt. This approach 
is  not operation-friendly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: kylin-1.5.4同步hive元数据报错

2016-09-20 Thread hongbin ma

cannot reproduce either

when you run kylin.sh, there will be stdout telling the extracted
hive/hbase dependency paths, can you attach them for further analysis?

On Tue, Sep 20, 2016 at 10:51 AM, Li Yang  wrote:

> We may not be able to reproduce the problem (at least I cannot). Both 1.5.4
> and 1.5.3 works for me.
>
> By my experience, the root cause is often HBASE_CLASSPATH was swallowed. As
> a test, try below in command shell.
>
> [root@sandbox]# export HBASE_CLASSPATH=*ABCDE*
> [root@sandbox]# hbase classpath
> /usr/hdp/2.2.4.2-2/hbase/conf:/usr/lib/jvm/java-1.7.0-
> openjdk.x86_64/lib/tools.jar:/usr/hdp/2.2.4.2-2/hbase:/usr/
> hdp/2.2.4.2-2/hbase/lib/activation-1.1.jar:/usr/hdp/2.
> 2.4.2-2/hbase/lib/aopalliance-1.0.jar.:/usr/hdp/2.2.4.2-
> 2/zookeeper/*:/usr/hdp/2.2.4.2-2/zookeeper/lib/*:
> *ABCDE*
>
> If you don't get ABCDE from 'hbase classpath', that confirms
> HBASE_CLASSPATH was lost inside hbase shell.
>
> Cheers
> Yang
>
> On Tue, Sep 20, 2016 at 9:07 AM, ShaoFeng Shi 
> wrote:
>
> > Hi Tongxin,
> >
> > 1.5.4 has no special requirement on hive version; From 1.5.3 to 1.5.4,
> the
> > kylin.sh has some change, please check whether it was the shell script
> > which wasn't able to detect the dependency jars correctly. Please share
> > with us about your finding, or if you can fix that and contribute a
> patch,
> > that would be great.
> >
> >
> >
> > 2016-09-19 15:30 GMT+08:00 仇同心 :
> >
> >> 大家好：
> >> 今天在使用kylin1.5.4版本时，在同步hive元数据时报错：
> >> Load Hive Table Metadata From Tree页面一直显示：Loading Databases.
> >>
> >>
> >> 错误信息打印在kylin.out文件
> >>
> >> SEVERE: Servlet.service() for servlet [kylin] in context with path
> >> [/kylin] threw exception [Handler processing failed;
> >> nested exception is java.lang.NoClassDefFoundError:
> >> org/apache/hadoop/hive/ql/session/SessionState] with root cause
> >> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.sess
> >> ion.SessionState
> >> at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(W
> >> ebappClassLoaderBase.java:1858)
> >> at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(W
> >> ebappClassLoaderBase.java:1701)
> >> at org.apache.kylin.rest.controller.TableController.showHiveDat
> >> abases(TableController.java:315)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
> >> ssorImpl.java:57)
> >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
> >> thodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:606)
> >> at org.springframework.web.method.support.InvocableHandlerMetho
> >> d.doInvoke(InvocableHandlerMethod.java:221)
> >> at org.springframework.web.method.support.InvocableHandlerMetho
> >> d.invokeForRequest(InvocableHandlerMethod.java:13
> >> 6)
> >>
> >> 但是hive能正常使用，如果换成kylin 1.5.3版本没问题，Load Hive Table Metadata From
> >> Tree页面能显示出hive里的db.
> >> 我使用的hive版本是1.2.1，不知道kylin 1.5.4是否对hive 版本有要求？
> >>
> >>
> >> 谢谢！
> >>
> >>
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2031) some more DimensionEncoding

2016-09-19 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2031:
-

 Summary: some more DimensionEncoding
 Key: KYLIN-2031
 URL: https://issues.apache.org/jira/browse/KYLIN-2031
 Project: Kylin
  Issue Type: New Feature
Reporter: hongbin ma
Assignee: hongbin ma


1. for some use cases string value represent hash code is used. The string only 
consist of [0~9A~F] (hex values), so two characters can be squashed into one 
byte
2. The current IntegerDimEnc does not support negative values, need another 
IntegerDimEnc that supports negative values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2021) Cognos Issues

2016-09-17 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2021:
-

 Summary: Cognos Issues
 Key: KYLIN-2021
 URL: https://issues.apache.org/jira/browse/KYLIN-2021
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


cognos will generate some queries that kylin does not support yet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [Announce] Apache Kylin 1.5.4 released

2016-09-17 Thread hongbin ma

thanks shaofeng!

On Fri, Sep 16, 2016 at 10:34 PM, Billy(Yiming) Liu  wrote:

> Thanks Shaofeng and our community.
>
> 2016-09-16 21:25 GMT+08:00 ShaoFeng Shi :
>
>> The Apache Kylin team is pleased to announce the immediate availability of
>> the 1.5.4 release.
>>
>> This is a bug fix release based on 1.5.3; All of the changes in this
>> release can be found in:
>> https://kylin.apache.org/docs15/release_notes.html
>>
>> You can download the source release and binary packages from
>> https://www.apache.org/dyn/closer.cgi?path=/kylin/apache-kylin-1.5.4/
>>
>> More information about the binary packages is on Kylin's download page
>> https://kylin.apache.org/download/
>>
>> Apache Kylin is an open source Distributed Analytics Engine designed to
>> provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop,
>> supporting extremely large datasets.
>>
>> Apache Kylin lets you query massive data set at sub-second latency in 3
>> steps:
>> 1. Identify a Star Schema data on Hadoop.
>> 2. Build Cube on Hadoop.
>> 3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
>> or RESTful API.
>>
>> Thanks everyone who have contributed to the 1.5.4 release.
>>
>> We welcome your help and feedback. For more information on how to
>> report problems, and to get involved, visit the project website at
>> https://kylin.apache.org/
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: 答复: [VOTE] Release apache-kylin-1.5.4 (release candidate 1)

2016-09-13 Thread hongbin ma

+1 binding

signature verifed




-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-2005) Move all storage side behavior hints to GTScanRequest

2016-09-09 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-2005:
-

 Summary: Move all storage side behavior hints to GTScanRequest
 Key: KYLIN-2005
 URL: https://issues.apache.org/jira/browse/KYLIN-2005
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-1999) Use some compression at UT/IT

2016-09-07 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-1999:
-

 Summary: Use some compression at UT/IT
 Key: KYLIN-1999
 URL: https://issues.apache.org/jira/browse/KYLIN-1999
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


KYLIN-1984 disabled compression in packaging configurations to maximum ease of 
use. Still we need to make sure everything will work if we want compression 
enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Do not order columns for Auto Generate Dimensions in Cube Design

2016-09-03 Thread hongbin ma

The order by original table is already lost when we define dimensions in
model. So the problem becomes how to order the dimension when creating
model?

To illustrate the problem let's assume we have fact table and a lookup
table. Assume the columns on fact is "dim1, FK, dim2, metrics1, metrics2",
and the lookup table looks like "PK, dim3, dim4", what kind of ordering is
reasonable? 1234 ro 1342?

On Sat, Sep 3, 2016 at 6:01 PM, Yiming Liu <liuyiming@gmail.com> wrote:

> I would like suggest keeping original order still. Just like we execute
> "desc table", no reorder happens, and the original order normal has some
> kinds of relationship.
>
> 2016-09-03 16:13 GMT+08:00 hongbin ma <mahong...@apache.org>:
>
> > +1
> >
> > alphabetical order obviously makes no sense. can we sort them by
> > cardinality?
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Turn off some hive configuration commands when they are not allowed

2016-09-03 Thread hongbin ma

Do we have JIRAs to track the issue? it's easy to forget

On Sat, Sep 3, 2016 at 5:57 PM, Yiming Liu <liuyiming@gmail.com> wrote:

> No progress, pending.
>
> 2016-09-03 17:36 GMT+08:00 hongbin ma <mahong...@apache.org>:
>
> > do we have any progress on such documents?
> >
> > On Tue, Aug 2, 2016 at 8:50 AM, Yiming Liu <liuyiming@gmail.com>
> > wrote:
> >
> > > Thanks, Shaofeng. It makes sense to grant enough privileges to Kylin
> for
> > > Cube building. Just in some extreme cases, the privilege issue will be
> a
> > > show stop.
> > >
> > > The privilege document is great. It's very helpful for Hadoop system
> > > administrator.
> > >
> > > 2016-08-01 9:38 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
> > >
> > > > Hi Yiming,
> > > >
> > > > The "mapreduce.job.reduces"  need by set at runtime, whose number is
> > > > calculated based on user tables' size, it couldn't be pre-configured.
> > > >
> > > > The "hive.merge.mapredfiles=false" can be externalized to the conf
> > file;
> > > > The hive merge is not needed since 1.5.3, I set in code to ensure it
> > will
> > > > be not be enabled (config files before 1.5.3 has this param set to
> > true).
> > > >
> > > > For other parameters, I think they're optional, but it is better to
> > keep
> > > as
> > > > they're good for performance, like dfs.replication=2, compress.codec
> > etc.
> > > >
> > > > Usually in a hadoop cluster, Apache Kylin should be treated as a
> > > > priviledged user (instead of a normal user like analyst), which can
> > > execute
> > > > necessary hadoop/hdfs/hbase/hive actions (like mkdir, create htable,
> > > etc);
> > > > To achieve this, the administartor need do some configurations and
> > > > authorizations; What we need do is to compose a document to list
> > > > these privileges, what's your opinion?
> > > >
> > > > Thanks for the comment!
> > > >
> > > >
> > > > 2016-07-30 14:03 GMT+08:00 Yiming Liu <liuyiming@gmail.com>:
> > > >
> > > > > Hi Kylin dev,
> > > > >
> > > > > The first step is building cube is to CreateFlatHiveTable, it will
> > > call a
> > > > > few hive configuration commands, such as
> > > > > CreateFlatHiveTableStep line 78 and 79.
> > > > > set mapreduce.job.reduces=numReduces
> > > > > set hive.merge.mapredfiles=false
> > > > >
> > > > > Are these commands necessary for the cube building? Could we
> > configure
> > > > them
> > > > > in files? I met some cases, where the hiveserver would say
> > > "Configuration
> > > > > is not allowed to modify at runtime". It will break the build.
> > > > >
> > > > > Maybe there are some other hard code hadoop commands still. It will
> > be
> > > > more
> > > > > friendly if they could turn off on demand.
> > > > >
> > > > > --
> > > > > With Warm regards
> > > > >
> > > > > Yiming Liu (刘一鸣)
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > > Shaofeng Shi
> > > >
> > >
> > >
> > >
> > > --
> > > With Warm regards
> > >
> > > Yiming Liu (刘一鸣)
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Turn off some hive configuration commands when they are not allowed

2016-09-03 Thread hongbin ma

do we have any progress on such documents?

On Tue, Aug 2, 2016 at 8:50 AM, Yiming Liu  wrote:

> Thanks, Shaofeng. It makes sense to grant enough privileges to Kylin for
> Cube building. Just in some extreme cases, the privilege issue will be a
> show stop.
>
> The privilege document is great. It's very helpful for Hadoop system
> administrator.
>
> 2016-08-01 9:38 GMT+08:00 ShaoFeng Shi :
>
> > Hi Yiming,
> >
> > The "mapreduce.job.reduces"  need by set at runtime, whose number is
> > calculated based on user tables' size, it couldn't be pre-configured.
> >
> > The "hive.merge.mapredfiles=false" can be externalized to the conf file;
> > The hive merge is not needed since 1.5.3, I set in code to ensure it will
> > be not be enabled (config files before 1.5.3 has this param set to true).
> >
> > For other parameters, I think they're optional, but it is better to keep
> as
> > they're good for performance, like dfs.replication=2, compress.codec etc.
> >
> > Usually in a hadoop cluster, Apache Kylin should be treated as a
> > priviledged user (instead of a normal user like analyst), which can
> execute
> > necessary hadoop/hdfs/hbase/hive actions (like mkdir, create htable,
> etc);
> > To achieve this, the administartor need do some configurations and
> > authorizations; What we need do is to compose a document to list
> > these privileges, what's your opinion?
> >
> > Thanks for the comment!
> >
> >
> > 2016-07-30 14:03 GMT+08:00 Yiming Liu :
> >
> > > Hi Kylin dev,
> > >
> > > The first step is building cube is to CreateFlatHiveTable, it will
> call a
> > > few hive configuration commands, such as
> > > CreateFlatHiveTableStep line 78 and 79.
> > > set mapreduce.job.reduces=numReduces
> > > set hive.merge.mapredfiles=false
> > >
> > > Are these commands necessary for the cube building? Could we configure
> > them
> > > in files? I met some cases, where the hiveserver would say
> "Configuration
> > > is not allowed to modify at runtime". It will break the build.
> > >
> > > Maybe there are some other hard code hadoop commands still. It will be
> > more
> > > friendly if they could turn off on demand.
> > >
> > > --
> > > With Warm regards
> > >
> > > Yiming Liu (刘一鸣)
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Do not order columns for Auto Generate Dimensions in Cube Design

2016-09-03 Thread hongbin ma

+1

alphabetical order obviously makes no sense. can we sort them by
cardinality?



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: [jira] [Commented] (KYLIN-1994) Build Cube failing

2016-09-03 Thread hongbin ma

hi guys

please try to use JIRA comment instead of email reply. cuz email reply
messages are not appearing on JIRA

https://issues.apache.org/jira/browse/INFRA-12561

On Sat, Sep 3, 2016 at 11:08 AM, siva kumar Rachaputi (JIRA) <
j...@apache.org> wrote:

>
> [ https://issues.apache.org/jira/browse/KYLIN-1994?page=com.
> atlassian.jira.plugin.system.issuetabpanels:comment-tabpane
> l=15460236#comment-15460236 ]
>
> siva kumar Rachaputi commented on KYLIN-1994:
> -
>
> [~Shaofengshi] - Thanks for your suggestion. I am able to build the cube
> now.
>
> just for others information, I am posting the suggestion here
>
> The "killed by admin" error in "Build Cube" step usually was caused by
> YARN couldn't allocate enough resouce(memory). I see this error in sandbox
> env or very small hadoop, as Kylin requests 3 Gb memory in this step; To
> bypass you can reduce the numbers (mapreduce.map.memory.mb and
> mapreduce.map.java.opts)  in conf/kylin_job_inmem.xml. For a real hadoop
> env, please check YARN configurations
>
>
>
> > Build Cube failing
> > --
> >
> > Key: KYLIN-1994
> > URL: https://issues.apache.org/jira/browse/KYLIN-1994
> > Project: Kylin
> >  Issue Type: Bug
> >  Components: Web
> >Reporter: siva kumar Rachaputi
> >Assignee: Zhong,Jason
> >
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-1981) PK/FK derived will break for left join in some cases

2016-08-29 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-1981:
-

 Summary: PK/FK derived will break for left join in some cases
 Key: KYLIN-1981
 URL: https://issues.apache.org/jira/browse/KYLIN-1981
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


for left join cubes. suppose A is the FK in fact table, and B is the PK in 
lookup table. query like below will end up with "more" results

select B, count(*) from fact left join lookup group by B



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-1979) Move hackNoGroupByAggregation to cube-based storage implementations

2016-08-28 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-1979:
-

 Summary: Move hackNoGroupByAggregation to cube-based storage 
implementations
 Key: KYLIN-1979
 URL: https://issues.apache.org/jira/browse/KYLIN-1979
 Project: Kylin
  Issue Type: Improvement
Reporter: hongbin ma
Assignee: hongbin ma


as it only makes sense for cube-based realizations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Kylin Query Performance

2016-08-27 Thread hongbin ma

does your test query contain any aggregations? like min max sum. If no
aggregation exists, then the query will seek to scan base cuboid, which is
nearly as large as raw data in some bad cases.

On Fri, Aug 19, 2016 at 11:10 AM, Mars J  wrote:

> just 'select A,B from Fact f left join dima a on f.no=a.no...', when I
> query this, it needs 10+s, then the same sql will return result in 0.0s,
> but when I change the limit N or any column behind 'select' and 'group by'
> and join table ,it costs more than 10+s again.
> the source records in fact and dima is 160w , and count no in htable is
> 1.9billion.
>
> 2016-08-18 17:50 GMT+08:00 Li Yang :
>
> > Performance troubleshoot is complicated and requires much information. If
> > you can share a diagnosis pack, people maybe able to help.
> >
> > On Sat, Aug 13, 2016 at 10:32 PM, Yiming Liu 
> > wrote:
> >
> > > What's kind of queries? How many records supposed to be returned? Could
> > you
> > > send out the query log together?
> > >
> > > 2016-08-12 10:28 GMT+08:00 Mars J :
> > >
> > > > Hi,
> > > > I have run Kylin 1.5.2.1 and build a cube successfully,the cube
> > size
> > > is
> > > > 3.2G for fact table and dimensional table have 1.5 million records
> > > > seperately.HTable count is about 200 billion.
> > > >My Cube Design includes 3 Derived Dims and 5 Normal Dims which
> > formed
> > > a
> > > > hierachy dim in agg. Rowkeys are generated automatically,the first
> and
> > > > second rowkey is the highest cardinality column in fact table and one
> > dim
> > > > table seperately.
> > > >
> > > > When I query it from kylin insight, it costs 11s,and the second
> > same
> > > > query is also 10+s, How can I optimize this ?
> > > >
> > >
> > >
> > >
> > > --
> > > With Warm regards
> > >
> > > Yiming Liu (刘一鸣)
> > >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: [jira] [Commented] (KYLIN-1908) Collect Metrics to JMX

2016-08-27 Thread hongbin ma

The documentation itself looks good.

where should we put the document? the blog(current approach) or "Docs"? I
prefer the latter because JMX is basic feature and it should be better
categoried

On Sun, Aug 28, 2016 at 10:23 AM, Billy(Yiming) Liu (JIRA) 
wrote:

>
> [ https://issues.apache.org/jira/browse/KYLIN-1908?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=15442591#comment-15442591 ]
>
> Billy(Yiming) Liu commented on KYLIN-1908:
> --
>
> Thanks [~kangkaisen]. It looks good.
>
> Hi [~liyang.g...@gmail.com], do you think it's OK to publish or need more
> polish, still?
>
> > Collect Metrics to JMX
> > --
> >
> > Key: KYLIN-1908
> > URL: https://issues.apache.org/jira/browse/KYLIN-1908
> > Project: Kylin
> >  Issue Type: New Feature
> >  Components: Tools, Build and Test
> >Affects Versions: v1.5.2
> >Reporter: kangkaisen
> >Assignee: kangkaisen
> > Fix For: v1.5.4
> >
> > Attachments: KYLIN-1908.patch, QueryMetrics.java
> >
> >
> > As we all known, some performance metrics is important for enterprise
> applications. so we should support to collect metrics to JMX in Kylin.
> > The method I have done is As shown below:
> > 1. use `org.apache.hadoop.metrics2` as the metrics collection framework.
> > 2. define MBean Class for the metrics that we need to collect.
> > 3. update metrics in right place.
> > The questions I have:
> > 1. can I depend on `org.apache.hadoop.metrics2` directly?
> > 2. how do you think about my method?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-1964) Add a companion tool of CubeMetaExtractor for cube importing

2016-08-18 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-1964:
-

 Summary: Add a companion tool of CubeMetaExtractor for cube 
importing
 Key: KYLIN-1964
 URL: https://issues.apache.org/jira/browse/KYLIN-1964
 Project: Kylin
  Issue Type: Wish
Reporter: hongbin ma
Assignee: hongbin ma


Now that we have CubeMetaExtractor for cube exporting, additionally we need a 
importer to import the exported cube



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Derived measures in Kylin

2016-08-10 Thread hongbin ma

I still don't quite understand your problem. What exception are your
witnessing when you run queries containing (SUM(BaseSalary) +
SUM(Hra))/SUM(DA)?

On Wed, Aug 10, 2016 at 1:10 PM, Reshma  wrote:

> Hi,To clarify, lets says we want feature likeI have following columns in my
> Person table,EmpNo, FirstName, LastName, Department, Phone, BaseSalary,
> Hra,
> DAPlease see attached Image1When i develop cube, i created
> Dimensions(FirstName, LastName, Department) and Measures(SUM(BaseSalary),
> SUM(Hra), SUM(DA), Max(BaseSalary))I can easily show it in a Tableau UI in
> a
> matrix like grouped by dimensions (FirstName, LastName)Measures -->
> SUM(BaseSalary), SUM(Hra), SUM(DA)Now i want another calculated measures
> Ratio like (SUM(BaseSalary) + SUM(Hra))/SUM(DA)Please see attached
> Image2Now
> this has to be calculated at runAlso, now we add another grouping dimension
> Department, the ratio will again change as SUM(BaseSalary), SUM(HRA) and
> SUM(DA) will change at group level.Please see attached Image3Let me know
> how
> i can configure Kylin do have this.
>  time by
> Cube
> based on grouping selected(in this case FirstName, LastName).
> 
> 
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Derived-measures-in-Kylin-tp5513p5534.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Re: Build Cube killed by admin

2016-08-10 Thread hongbin ma

As the step output indicates the job failed because of "killed by admin",
in these cases you usually need to check hadoop's log




-- 
Regards,

*Bin Mahone | 马洪宾*

Re: 答复: Question abount BuildInFunctionTransformer

2016-08-10 Thread hongbin ma

JIRA: https://issues.apache.org/jira/browse/KYLIN-1954

thanks for reporting!

On Thu, Aug 11, 2016 at 10:54 AM, hongbin ma <mahong...@apache.org> wrote:

> I think you're right on this.
>
> the code modifies the filter in the first CubeSegmentScanner and won't get
> modified in the subsequent CubeSegmentScanners again. Due to the difference
> of different segment's dictionaries, it might be wrong.
>
> I'm opening a JIRA for this and it will be fixed in 1.5.4
>
> On Wed, Aug 10, 2016 at 10:53 AM, yubo-...@yolo24.com <yubo-...@yolo24.com
> > wrote:
>
>> CubeStorageQuery.search/ CubeSegmentScanner
>>
>> when filter is translated for the first segment, filter is changed to
>> CompareTupleFilter(IN clause)
>> translate will not triger for the next segments.
>> this is not right because dictionary is not same for every segments.
>>
>> assume data like this:
>>
>> merchant_name  cube segment
>> 深海新创专营  20160725
>> 深海新创手机  20160726
>>
>> when search with like '%深海新创%'
>> CubeSegmentScanner scan segment '20160725' , and filter is changed to in
>> clause（ IN '深海新创专营'）
>> result is right for this segment ,but not for the next segments because
>> filter now has been changed.
>>
>>
>>
>> --
>> View this message in context: http://apache-kylin.74782.x6.n
>> abble.com/Question-abount-BuildInFunctionTransformer-tp5499p5533.html
>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>



-- 
Regards,

*Bin Mahone | 马洪宾*

[jira] [Created] (KYLIN-1954) BuildInFunctionTransformer should be executed per CubeSegmentScanner

2016-08-10 Thread hongbin ma (JIRA)

hongbin ma created KYLIN-1954:
-

 Summary: BuildInFunctionTransformer should be executed per 
CubeSegmentScanner
 Key: KYLIN-1954
 URL: https://issues.apache.org/jira/browse/KYLIN-1954
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.3
Reporter: hongbin ma
Assignee: hongbin ma


reported from dev mail list "Question abount BuildInFunctionTransformer"

Sorry for the wrong description and thanks for the explaination.

I have another question on this.

Case1
select merchant_name,dt_day,count(*)
from session_view_shop_0
where merchant_name like '%深海新创手机%'
and dt_year='2016'
and dt_month='07'
and dt_day >='25'
and dt_day <='28'
group by merchant_name,dt_day

2016-08-05 09:25:06,263 INFO  [http-bio-7070-exec-10] 
dict.BuildInFunctionTransformer:66 : Translated {LIKE(KYLIN_REPORT_DB.SESSION_
VIEW_SHOP_0.MERCHANT_NAME,%深海新创手机%)} to IN clause: 
{KYLIN_REPORT_DB.SESSION_VIEW_SHOP_0.MERCHANT_NAME IN []}

Result1
深海新创手机专营店80002972 28 6360
深海新创手机专营店80002972 27 5501
深海新创手机专营店80002972 26 4830

Case 2
select merchant_name,dt_day,count(*)
from session_view_shop_0
where merchant_name like '%深海新创%'
and dt_year='2016'
and dt_month='07'
and dt_day >='25'
and dt_day <='28'
group by merchant_name,dt_day

2016-08-05 09:37:55,469 INFO  [http-bio-7070-exec-15] 
dict.BuildInFunctionTransformer:66 : Translated {LIKE(KYLIN_REPORT_DB.SESSION_
VIEW_SHOP_0.MERCHANT_NAME,%深海新创%)} to IN clause: 
{KYLIN_REPORT_DB.SESSION_VIEW_SHOP_0.MERCHANT_NAME IN [深海新创专营店80002972]}

Result2
深海新创专营店80002972 25 5283


’深海新创手机专营店80002972’ is expected in result2 , as it exists which case1 shows.



CubeStorageQuery.search/ CubeSegmentScanner

when filter is translated for the first segment, filter is changed to
CompareTupleFilter(IN clause)
translate will not triger for the next segments.
this is not right because dictionary is not same for every segments.

assume data like this:

merchant_name  cube segment
深海新创专营  20160725
深海新创手机  20160726

when search with like '%深海新创%'
CubeSegmentScanner scan segment '20160725' , and filter is changed to in
clause（ IN '深海新创专营'）
result is right for this segment ,but not for the next segments because
filter now has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 答复: Question abount BuildInFunctionTransformer

2016-08-10 Thread hongbin ma

I think you're right on this.

the code modifies the filter in the first CubeSegmentScanner and won't get
modified in the subsequent CubeSegmentScanners again. Due to the difference
of different segment's dictionaries, it might be wrong.

I'm opening a JIRA for this and it will be fixed in 1.5.4

On Wed, Aug 10, 2016 at 10:53 AM, yubo-...@yolo24.com 
wrote:

> CubeStorageQuery.search/ CubeSegmentScanner
>
> when filter is translated for the first segment, filter is changed to
> CompareTupleFilter(IN clause)
> translate will not triger for the next segments.
> this is not right because dictionary is not same for every segments.
>
> assume data like this:
>
> merchant_name  cube segment
> 深海新创专营  20160725
> 深海新创手机  20160726
>
> when search with like '%深海新创%'
> CubeSegmentScanner scan segment '20160725' , and filter is changed to in
> clause（ IN '深海新创专营'）
> result is right for this segment ,but not for the next segments because
> filter now has been changed.
>
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Question-abount-BuildInFunctionTransformer-tp5499p5533.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Kylin Cube Performance

2016-08-04 Thread hongbin ma

If you have a limit of 100, kylin is SUPPOSED to be far more efficient.
However there's currently a issue here that might cause limit clause being
overlooked (https://issues.apache.org/jira/browse/KYLIN-1936)

I'm working on fixing KYLIN-1936, it will be fixed in 1.5.4.

On Thu, Aug 4, 2016 at 11:36 PM, Jason Hale <ja...@koddi.com> wrote:

> True, but even if there's a limit of 100, it still has to scan all records?
> Perhaps I'm just used to how Postgres handles that as it only scans the
> necessary records, not the entire set if it's limited. I can rethink the
> way I approach it if that's the case.
>
> On Thu, Aug 4, 2016 at 10:31 AM, hongbin ma <mahong...@apache.org> wrote:
>
> > Hi Jason
> >
> > As Shaofeng explained it's not reasonable to expect sub-second latency if
> > you're returning tens of millions of records. You data model is quite
> > simple and you don't have costly measure like distinct count etc., so
> kylin
> > should be performant on normal OLAP queries.
> >
> > Another advise: if the cardinality of the mandatory dimensions (CHILD_ID
> > and SITE_ID) are very high, you might isolate such dimensions into a
> > separate "aggregation group", so that 1. queries not touching these
> > dimensions can be performant 2. calculate less cuboids. Please refer to
> > http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/
> >
> > On Thu, Aug 4, 2016 at 11:11 PM, ShaoFeng Shi <shaofeng...@apache.org>
> > wrote:
> >
> > > The log is pretty clear; the cuboid is exact match, but the scan count
> is
> > > massive:
> > >
> > > Visiting hbase table KYLIN_RIK9O18H07: cuboid exact match, from 992 to
> > 992
> > > Total scan count: 12306477
> > >
> > > Please add where condition to narrow down the result set as much as
> > > possible; It doesn't make sense for an OLAP query to return millions of
> > > records;
> > >
> > > 2016-08-04 13:05 GMT+08:00 Jason Hale <ja...@koddi.com>:
> > >
> > > > Sure, see kylin.log below:
> > > >
> > > > 2016-08-04 00:47:35,839 INFO  [http-bio-7070-exec-7]
> > > > controller.QueryController:175 : The original query:  SELECT
> > SUM(clicks)
> > > > FROM hpa_reporting2 GROUP BY site_id, child_id, search_type,
> hotel_id,
> > > > report_date
> > > > 2016-08-04 00:47:35,839 INFO  [http-bio-7070-exec-7]
> > > > service.QueryService:266 : The corrected query: SELECT SUM(clicks)
> FROM
> > > > hpa_reporting2 GROUP BY site_id, child_id, search_type, hotel_id,
> > > > report_date
> > > > LIMIT 5
> > > > 2016-08-04 00:47:35,908 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:48
> > > > : The project manager's reference is
> > > > org.apache.kylin.metadata.project.ProjectManager@3a3735a5
> > > > 2016-08-04 00:47:35,909 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:60
> > > > : Find candidates by table DEFAULT.HPA_REPORTING2 and
> > project=KODDI_DEV :
> > > > org.apache.kylin.query.routing.Candidate@51ed1b3b
> > > > 2016-08-04 00:47:35,909 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:49
> > > > : Applying rule: class
> > > > org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> > > > realizations before: [hpa_reporting2_cube_clone(CUBE)], realizations
> > > > after:
> > > > [hpa_reporting2_cube_clone(CUBE)]
> > > > 2016-08-04 00:47:35,910 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:49
> > > > : Applying rule: class
> > > > org.apache.kylin.query.routing.rules.RealizationSortRule,
> realizations
> > > > before: [hpa_reporting2_cube_clone(CUBE)], realizations after:
> > > > [hpa_reporting2_cube_clone(CUBE)]
> > > > 2016-08-04 00:47:35,910 INFO  [http-bio-7070-exec-7]
> > > routing.QueryRouter:72
> > > > : The realizations remaining: [hpa_reporting2_cube_clone(CUBE)] And
> the
> > > > final chosen one is the first one
> > > > 2016-08-04 00:47:35,975 DEBUG [http-bio-7070-exec-7]
> > > > enumerator.OLAPEnumerator:107 : query storage...
> > > > 2016-08-04 00:47:35,976 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeStorageQuery:239 : exactAggregation is true
> > > > 2016-08-04 00:47:35,976 INFO  [http-bio-7070-exec-7]
> > > > v2.CubeStorageQuery:357 : Enable limit 5
> > > > 2016-08-04 00:47:35,977 DEBUG [http-bio-7070-exec-7]
> > > > v2.C

Re: Kylin Cube Performance

2016-08-04 Thread hongbin ma

Hi Jason

As Shaofeng explained it's not reasonable to expect sub-second latency if
you're returning tens of millions of records. You data model is quite
simple and you don't have costly measure like distinct count etc., so kylin
should be performant on normal OLAP queries.

Another advise: if the cardinality of the mandatory dimensions (CHILD_ID
and SITE_ID) are very high, you might isolate such dimensions into a
separate "aggregation group", so that 1. queries not touching these
dimensions can be performant 2. calculate less cuboids. Please refer to
http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

On Thu, Aug 4, 2016 at 11:11 PM, ShaoFeng Shi 
wrote:

> The log is pretty clear; the cuboid is exact match, but the scan count is
> massive:
>
> Visiting hbase table KYLIN_RIK9O18H07: cuboid exact match, from 992 to 992
> Total scan count: 12306477
>
> Please add where condition to narrow down the result set as much as
> possible; It doesn't make sense for an OLAP query to return millions of
> records;
>
> 2016-08-04 13:05 GMT+08:00 Jason Hale :
>
> > Sure, see kylin.log below:
> >
> > 2016-08-04 00:47:35,839 INFO  [http-bio-7070-exec-7]
> > controller.QueryController:175 : The original query:  SELECT SUM(clicks)
> > FROM hpa_reporting2 GROUP BY site_id, child_id, search_type, hotel_id,
> > report_date
> > 2016-08-04 00:47:35,839 INFO  [http-bio-7070-exec-7]
> > service.QueryService:266 : The corrected query: SELECT SUM(clicks) FROM
> > hpa_reporting2 GROUP BY site_id, child_id, search_type, hotel_id,
> > report_date
> > LIMIT 5
> > 2016-08-04 00:47:35,908 INFO  [http-bio-7070-exec-7]
> routing.QueryRouter:48
> > : The project manager's reference is
> > org.apache.kylin.metadata.project.ProjectManager@3a3735a5
> > 2016-08-04 00:47:35,909 INFO  [http-bio-7070-exec-7]
> routing.QueryRouter:60
> > : Find candidates by table DEFAULT.HPA_REPORTING2 and project=KODDI_DEV :
> > org.apache.kylin.query.routing.Candidate@51ed1b3b
> > 2016-08-04 00:47:35,909 INFO  [http-bio-7070-exec-7]
> routing.QueryRouter:49
> > : Applying rule: class
> > org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
> > realizations before: [hpa_reporting2_cube_clone(CUBE)], realizations
> > after:
> > [hpa_reporting2_cube_clone(CUBE)]
> > 2016-08-04 00:47:35,910 INFO  [http-bio-7070-exec-7]
> routing.QueryRouter:49
> > : Applying rule: class
> > org.apache.kylin.query.routing.rules.RealizationSortRule, realizations
> > before: [hpa_reporting2_cube_clone(CUBE)], realizations after:
> > [hpa_reporting2_cube_clone(CUBE)]
> > 2016-08-04 00:47:35,910 INFO  [http-bio-7070-exec-7]
> routing.QueryRouter:72
> > : The realizations remaining: [hpa_reporting2_cube_clone(CUBE)] And the
> > final chosen one is the first one
> > 2016-08-04 00:47:35,975 DEBUG [http-bio-7070-exec-7]
> > enumerator.OLAPEnumerator:107 : query storage...
> > 2016-08-04 00:47:35,976 INFO  [http-bio-7070-exec-7]
> > v2.CubeStorageQuery:239 : exactAggregation is true
> > 2016-08-04 00:47:35,976 INFO  [http-bio-7070-exec-7]
> > v2.CubeStorageQuery:357 : Enable limit 5
> > 2016-08-04 00:47:35,977 DEBUG [http-bio-7070-exec-7]
> > v2.CubeHBaseEndpointRPC:257 : New scanner for current segment
> > hpa_reporting2_cube_clone[1970010100_2016082800] will use
> > SCAN_FILTER_AGGR_CHECKMEM as endpoint's behavior
> > 2016-08-04 00:47:35,979 DEBUG [http-bio-7070-exec-7]
> > v2.CubeHBaseEndpointRPC:313 : Serialized scanRequestBytes 836 bytes,
> > rawScanBytesString 56 bytes
> > 2016-08-04 00:47:35,979 INFO  [http-bio-7070-exec-7]
> > v2.CubeHBaseEndpointRPC:315 : The scan 31b2dd4c for segment
> > hpa_reporting2_cube_clone[1970010100_2016082800] is as below with
> > 1
> > separate raw scans, shard part of start/end key is set to 0
> > 2016-08-04 00:47:35,980 INFO  [http-bio-7070-exec-7] v2.CubeHBaseRPC:271
> :
> > Visiting hbase table KYLIN_RIK9O18H07: cuboid exact match, from 992 to
> 992
> > Start:
> > \x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\x00\x00\x00\x00\
> > x00\x00\x00\x00\x00
> > (\x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\x00\x00\x00\x00\
> > x00\x00\x00\x00\x00)
> > Stop:
> >  \x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\xFF\xFF\xFF\xFF\
> > xFF\xFF\xFF\xFF\xFF\x00
> > (\x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\xFF\xFF\xFF\xFF\
> > xFF\xFF\xFF\xFF\xFF\x00),
> > No Fuzzy Key
> > 2016-08-04 00:47:35,981 DEBUG [http-bio-7070-exec-7]
> > v2.CubeHBaseEndpointRPC:320 : Submitting rpc to 1 shards starting from
> > shard 2, scan range count 1
> > 2016-08-04 00:47:35,981 INFO  [http-bio-7070-exec-7]
> > v2.CubeHBaseEndpointRPC:103 : Timeout for ExpectedSizeIterator is: 99000
> > 2016-08-04 00:47:35,981 DEBUG [http-bio-7070-exec-7]
> > enumerator.OLAPEnumerator:127 : return TupleIterator...
> > 2016-08-04 00:47:52,773 INFO  [pool-6-thread-1]
> v2.CubeHBaseEndpointRPC:351
> > :  Endpoint RPC returned from
> HTable
> > KYLIN_RIK9O18H07 Shard
> > \x4B\x59\x4C\x49\x4E\x5F\x52\x49\x4B\x39\x4F\x31\x38\x48\
> >

Re: Build Cube killed by admin

2016-08-04 Thread hongbin ma

no zip files found

have you checked the hadoop logs?

On Thu, Aug 4, 2016 at 4:12 PM, liuhua...@neusoft.com  wrote:

>
> onf /opt/apache-kylin-1.5.3-bin/conf/kylin_job_conf_inmem.xml -cubename 
> test0804 -segmentname FULL_BUILD -output 
> /kylin/kylin_metadata/kylin-7571010b-0a15-4858-99ed-3076d6902c7a/test0804/cuboid/
>  -jobname Kylin_Cube_Builder_test0804 -cubingJobId 
> 7571010b-0a15-4858-99ed-3076d6902c7a.
> the log:killed by admin.
> I have do my best to fin
>




-- 
Regards,

*Bin Mahone | 马洪宾*

Re: Question abount BuildInFunctionTransformer

2016-08-04 Thread hongbin ma

To me the translation is expected behavior

On Thu, Aug 4, 2016 at 10:47 PM, Yiming Liu  wrote:

> Why do you think that's not expectation? What's the expected translation?
> Kylin will encode the dimension column original value into internal
> representation. And based on these encoding dictionary, Kylin would rewrite
> some SQL for better performance. In your case, the LIKE statement was
> translated into IN, then Kylin would find the query result by the key
> directly. It does not need filter all raw data.
>
> 2016-08-04 18:20 GMT+08:00 yubo-...@yolo24.com :
>
> >
> >
> >
> > Hi all,
> >
> > When we search with a sql “like clause”, we found in the log there will
> be
> > a
> > BuildInFunctionTransformer which will transform the “like clause” into
> “in
> > clause”
> > But values seems not exactly right. Below is my usecases.
> >
> > Case1 translate '%深海%'  to [深海新创专营店80002972, 义深海官方旗舰店80011438]
> > Case 2 translate  '%深海新创%'  to [深海新创专营店80002972]
> >
> > This is not espected.
> >
> > Is this a bug or can anyone help to explain? Thanks.
> >
> > 1.
> > sql:
> >
> > select merchant_name,*
> > from session_view_shop_0
> > where merchant_name like '%深海%'
> > and dt_year='2016'
> > and dt_month='07'
> > and dt_day >='25'
> > and dt_day <='28'
> >
> > 2016-08-04 17:38:46,095 INFO  [http-bio-7070-exec-31]
> > dict.BuildInFunctionTransformer:66 : Translated
> > {LIKE(KYLIN_REPORT_DB.SESSION_
> > VIEW_SHOP_0.MERCHANT_NAME,%深海%)} to IN clause:
> > {KYLIN_REPORT_DB.SESSION_VIEW_SHOP_0.MERCHANT_NAME IN [深海新创专营店80002972,
> 义深
> > 海官方旗舰店80011438]}
> >
> > 2.
> > Sql:
> >
> > select merchant_name,*
> > from session_view_shop_0
> > where merchant_name like '%深海新创%'
> > and dt_year='2016'
> > and dt_month='07'
> > and dt_day >='25'
> > and dt_day <='28'
> >
> > 2016-08-04 17:41:52,321 INFO  [http-bio-7070-exec-31]
> > dict.BuildInFunctionTransformer:66 : Translated
> > {LIKE(KYLIN_REPORT_DB.SESSION_VIEW_SHOP_0.MERCHANT_NAME,%深海新创%)} to IN
> > clause: {KYLIN_REPORT_DB.SESSION_VIEW_SHOP_0.MERCHANT_NAME IN
> > [深海新创专营店80002972]}
> >
> >
> > Kylin version:apache-kylin-1.5.2.1-HBase1.x-bin.tar.gz
> >
> >
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/Question-abount-BuildInFunctionTransformer-tp5499.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>



-- 
Regards,

*Bin Mahone | 马洪宾*

1 2 3 >

1 - 100 of 287 matches

Mail list logo