[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2019-01-06 Thread Shaofeng SHI (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735394#comment-16735394
 ] 

Shaofeng SHI edited comment on KYLIN-3722 at 1/7/19 2:35 AM:
-

# Add a switch "{{kylin.storage.limit-push-down-enabled=true"}} which allows 
turn off limit push down (default is enabled);
 # Turn off limit push down automatically when multi OLAP context detected in 
one query (such as sub query);
 # Add a test sql in integration test;


was (Author: shaofengshi):
# add a switch {{kylin.storage.limit-push-down-enable=true}} which could turn 
off limit push down
 # turn off limit push down when multi olap context related
 # add a new sql in IT

> Error Limit Push Down in Join Related Query
> ---
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: all
>Reporter: XiaoXiang Yu
>Assignee: XiaoXiang Yu
>Priority: Major
>  Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: 
> 0001-storageLimitLevel-set-to-NO_LIMIT-when-the-query-has.patch, 
> image-2018-12-16-17-06-16-341.png, image-2018-12-16-17-24-21-017.png, 
> image-2018-12-16-17-38-13-454.png, image-2018-12-16-22-40-36-857.png, 
> image-2018-12-16-22-44-32-875.png, image-2018-12-16-22-45-29-841.png, 
> image-2018-12-16-22-46-43-168.png, image-2018-12-16-22-48-25-622.png, 
> image-2018-12-16-22-54-25-718.png, image-2018-12-16-22-55-23-827.png, 
> image-2018-12-16-22-56-27-258.png, image-2018-12-16-22-56-50-607.png, 
> image-2018-12-16-22-58-04-486.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> 
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> 
>  Map 1 .. SUCCEEDED 1 1 0 0 0 0
>  Map 3 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 .. SUCCEEDED 1 1 0 0 0 0
>  
> 
>  VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s
>  
> 
>  OK
>  1 14565.4708
>  2 64744.893
>  3 64939.0199984
>  5 36563.767
>  6 36641.649
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 5(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
>  
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 
> org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) 
> from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > 
>  ==[QUERY]===
>  Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
>  SQL: select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>  select distinct item_id
>  from iteminfo
>  where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  User: ADMIN
>  Success: true
>  Duration: 0.202
>  Project: PearVideo
>  Realization Names: [CUBE[name=PearVideoCube1], 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722431#comment-16722431
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/17/18 4:43 AM:
---

h2. Test Evidence

In OLAPLimitRel, we add condition to diaable limit-pushdown in JoinRelated 
query.

!image-2018-12-16-17-24-21-017.png!

After this simple modification, we got a correct result.

 

!image-2018-12-16-17-38-13-454.png!

 

 
{quote}KYLIN [ DEBUG ] 12-16 17:17:38.949 
org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
 from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48
 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
 > clustered at head, groupsD: \{0} with cuboid columns: \{1}

KYLIN [ WARN ] 12-16 17:17:38.949 
org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:161)
 from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48
 > Not enabling limit push down because current limit is invalid: 2147483647
{quote}
 


was (Author: hit_lacus):
In OLAPLimitRel, we add condition to diaable limit-pushdown in JoinRelated 
query.

!image-2018-12-16-17-24-21-017.png!

After this simple modification, we got a correct result.

 

!image-2018-12-16-17-38-13-454.png!

 

 
{quote}KYLIN [ DEBUG ] 12-16 17:17:38.949 
org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
 from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48
 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
 > clustered at head, groupsD: \{0} with cuboid columns: \{1}

KYLIN [ WARN ] 12-16 17:17:38.949 
org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:161)
 from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48
 > Not enabling limit push down because current limit is invalid: 2147483647
{quote}
 

> Error Limit Push Down in Join Related Query
> ---
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: all
>Reporter: XiaoXiang Yu
>Assignee: XiaoXiang Yu
>Priority: Major
>  Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: image-2018-12-16-17-06-16-341.png, 
> image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, 
> image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, 
> image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, 
> image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png, 
> image-2018-12-16-22-55-23-827.png, image-2018-12-16-22-56-27-258.png, 
> image-2018-12-16-22-56-50-607.png, image-2018-12-16-22-58-04-486.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> 
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> 
>  Map 1 .. SUCCEEDED 1 1 0 0 0 0
>  Map 3 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 .. SUCCEEDED 1 1 0 0 0 0
>  
> 
>  VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s
>  
> 
>  OK
>  1 14565.4708
>  2 64744.893
>  3 64939.0199984
>  5 36563.767
>  6 36641.649
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 5(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
>  
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:17 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns 

Not only +*uid*+ is in groupByColumns, but also *+item_id+* in groupByColumns, 
just because it's in subQueryJoin.

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

*+uid and item_id area used to compared ITuple+*

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

This method use a counter which limit the result size according to 
comparator(above), this comparator *should only use* uid in compare method 
because uid is the only group by column. But comparator *really use* two 
columns(item_id and uid).

!image-2018-12-16-22-58-04-486.png!

 
{quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 
org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > Using SortMergedPartitionResultIterator to merge 1 partition results out of 
 > 1 partitions

KYLIN [ INFO ] 12-16 23:00:15.089 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> null

KYLIN [ INFO ] 12-16 23:00:15.090 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.091 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:06 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

!image-2018-12-16-22-58-04-486.png!

 
{quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 
org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > Using SortMergedPartitionResultIterator to merge 1 partition results out of 
 > 1 partitions

KYLIN [ INFO ] 12-16 23:00:15.089 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> null

KYLIN [ INFO ] 12-16 23:00:15.090 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.091 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=3,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=467,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6332.9699,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:10 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns 

Not only +*uid*+ is in groupByColumns, but also *+item_id+* in groupByColumns, 
just because it's in subQueryJoin.

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

*+uid and item_id area used to compared ITuple+*

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

!image-2018-12-16-22-58-04-486.png!

 
{quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 
org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > Using SortMergedPartitionResultIterator to merge 1 partition results out of 
 > 1 partitions

KYLIN [ INFO ] 12-16 23:00:15.089 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> null

KYLIN [ INFO ] 12-16 23:00:15.090 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.091 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:04 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

!image-2018-12-16-22-58-04-486.png!

 
{quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 
org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > Using SortMergedPartitionResultIterator to merge 1 partition results out of 
 > 1 partitions

KYLIN [ INFO ] 12-16 23:00:15.089 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> null

KYLIN [ INFO ] 12-16 23:00:15.090 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.091 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.092 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null,

KYLIN [ INFO ] 12-16 23:00:15.093 
org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112)
 from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51
 > UID=3,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=467,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6332.9699,_KY_SUM_USERACTION_PLAY_TIMES_=null,
 >  ==> 
 > 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:58 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

!image-2018-12-16-22-58-04-486.png!

That is to say, only 10 distinct row are 

 

 


was (Author: hit_lacus):
h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

 

 

 

!image-2018-12-16-22-48-25-622.png!

!image-2018-12-16-22-54-25-718.png!

 

That is to say, only 10 distinct row are 

 

 

> Error Limit Push Down in Join Related Query
> ---
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: all
>Reporter: XiaoXiang Yu
>Assignee: XiaoXiang Yu
>Priority: Major
>  Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: image-2018-12-16-17-06-16-341.png, 
> image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, 
> image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, 
> image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, 
> image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png, 
> image-2018-12-16-22-55-23-827.png, image-2018-12-16-22-56-27-258.png, 
> image-2018-12-16-22-56-50-607.png, image-2018-12-16-22-58-04-486.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> 
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> 
>  Map 1 .. SUCCEEDED 1 1 0 0 0 0
>  Map 3 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 .. SUCCEEDED 1 1 0 0 0 0
>  
> 
>  VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s
>  
> 
>  OK
>  1 14565.4708
>  2 64744.893
>  3 64939.0199984
>  5 36563.767
>  6 36641.649
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 5(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
> This error is reported by Meituan's Dev.
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 
> org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) 
> from Query 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:54 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

 

 

 

!image-2018-12-16-22-48-25-622.png!

!image-2018-12-16-22-54-25-718.png!

 

That is to say, only 10 distinct row are 

 

 


was (Author: hit_lacus):
h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

 

 

 

!image-2018-12-16-22-48-25-622.png!

That is to say, only 10 distinct 

 

> Error Limit Push Down in Join Related Query
> ---
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: all
>Reporter: XiaoXiang Yu
>Assignee: XiaoXiang Yu
>Priority: Major
>  Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: image-2018-12-16-17-06-16-341.png, 
> image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, 
> image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, 
> image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, 
> image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> 
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> 
>  Map 1 .. SUCCEEDED 1 1 0 0 0 0
>  Map 3 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 .. SUCCEEDED 1 1 0 0 0 0
>  
> 
>  VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s
>  
> 
>  OK
>  1 14565.4708
>  2 64744.893
>  3 64939.0199984
>  5 36563.767
>  6 36641.649
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 5(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
> This error is reported by Meituan's Dev.
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 
> org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) 
> from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > 
> ==[QUERY]===
> Query Id: 

[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query

2018-12-16 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492
 ] 

XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:49 PM:
---

h2. Analysis of cause
 *  In SQLDigest,  the column in join subquery will be add to 
SQLDigest.groupbyColumns

!image-2018-12-16-22-40-36-857.png!
 * In SequentialCubeTupleIterator.getTupleDimensionComparator

!image-2018-12-16-22-44-32-875.png!

Debugger show: 

!image-2018-12-16-22-45-29-841.png!  

  !image-2018-12-16-22-46-43-168.png!

 
 * In MergedIteratorWithLimit.hasNext

 

 

 

!image-2018-12-16-22-48-25-622.png!

That is to say, only 10 distinct 

 


was (Author: hit_lacus):
h2. Analysis of cause

 

todo

> Error Limit Push Down in Join Related Query
> ---
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: all
>Reporter: XiaoXiang Yu
>Assignee: XiaoXiang Yu
>Priority: Major
>  Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: image-2018-12-16-17-06-16-341.png, 
> image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, 
> image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, 
> image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, 
> image-2018-12-16-22-48-25-622.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> 
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> 
>  Map 1 .. SUCCEEDED 1 1 0 0 0 0
>  Map 3 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 .. SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 .. SUCCEEDED 1 1 0 0 0 0
>  
> 
>  VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s
>  
> 
>  OK
>  1 14565.4708
>  2 64744.893
>  3 64939.0199984
>  5 36563.767
>  6 36641.649
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 5(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
> This error is reported by Meituan's Dev.
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 
> org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) 
> from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > 
> ==[QUERY]===
> Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
> SQL: select uid, sum(active_minutes) as am
> from useraction
> where item_id in (
>  select distinct item_id
>  from iteminfo
>  where item_type in ('Video')
> ) and act_type != 'share'
> group by uid
> User: ADMIN
> Success: true
> Duration: 0.202
> Project: PearVideo
> Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]]
> Cuboid Ids: [14]
> Total scan count: 120
> Total scan bytes: 6442
> Result row count: 3
> Accept Partial: true
> Is Partial Result: false
> Hit Exception