[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735394#comment-16735394 ] Shaofeng SHI edited comment on KYLIN-3722 at 1/7/19 2:35 AM: - # Add a switch "{{kylin.storage.limit-push-down-enabled=true"}} which allows turn off limit push down (default is enabled); # Turn off limit push down automatically when multi OLAP context detected in one query (such as sub query); # Add a test sql in integration test; was (Author: shaofengshi): # add a switch {{kylin.storage.limit-push-down-enable=true}} which could turn off limit push down # turn off limit push down when multi olap context related # add a new sql in IT > Error Limit Push Down in Join Related Query > --- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: all >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: > 0001-storageLimitLevel-set-to-NO_LIMIT-when-the-query-has.patch, > image-2018-12-16-17-06-16-341.png, image-2018-12-16-17-24-21-017.png, > image-2018-12-16-17-38-13-454.png, image-2018-12-16-22-40-36-857.png, > image-2018-12-16-22-44-32-875.png, image-2018-12-16-22-45-29-841.png, > image-2018-12-16-22-46-43-168.png, image-2018-12-16-22-48-25-622.png, > image-2018-12-16-22-54-25-718.png, image-2018-12-16-22-55-23-827.png, > image-2018-12-16-22-56-27-258.png, image-2018-12-16-22-56-50-607.png, > image-2018-12-16-22-58-04-486.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > Map 1 .. SUCCEEDED 1 1 0 0 0 0 > Map 3 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 2 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 4 .. SUCCEEDED 1 1 0 0 0 0 > > > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s > > > OK > 1 14565.4708 > 2 64744.893 > 3 64939.0199984 > 5 36563.767 > 6 36641.649 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 5(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 > org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} > KYLIN [ INFO ] 12-16 17:04:28.299 > org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE > {quote} > > {quote}KYLIN [ INFO ] 12-16 17:04:28.405 > org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > > ==[QUERY]=== > Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708 > SQL: select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > User: ADMIN > Success: true > Duration: 0.202 > Project: PearVideo > Realization Names: [CUBE[name=PearVideoCube1],
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722431#comment-16722431 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/17/18 4:43 AM: --- h2. Test Evidence In OLAPLimitRel, we add condition to diaable limit-pushdown in JoinRelated query. !image-2018-12-16-17-24-21-017.png! After this simple modification, we got a correct result. !image-2018-12-16-17-38-13-454.png! {quote}KYLIN [ DEBUG ] 12-16 17:17:38.949 org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} KYLIN [ WARN ] 12-16 17:17:38.949 org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:161) from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48 > Not enabling limit push down because current limit is invalid: 2147483647 {quote} was (Author: hit_lacus): In OLAPLimitRel, we add condition to diaable limit-pushdown in JoinRelated query. !image-2018-12-16-17-24-21-017.png! After this simple modification, we got a correct result. !image-2018-12-16-17-38-13-454.png! {quote}KYLIN [ DEBUG ] 12-16 17:17:38.949 org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} KYLIN [ WARN ] 12-16 17:17:38.949 org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:161) from Query f2ed15c4-5d99-cfc6-5344-5b954dcdfa72-48 > Not enabling limit push down because current limit is invalid: 2147483647 {quote} > Error Limit Push Down in Join Related Query > --- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: all >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: image-2018-12-16-17-06-16-341.png, > image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, > image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, > image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, > image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png, > image-2018-12-16-22-55-23-827.png, image-2018-12-16-22-56-27-258.png, > image-2018-12-16-22-56-50-607.png, image-2018-12-16-22-58-04-486.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > Map 1 .. SUCCEEDED 1 1 0 0 0 0 > Map 3 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 2 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 4 .. SUCCEEDED 1 1 0 0 0 0 > > > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s > > > OK > 1 14565.4708 > 2 64744.893 > 3 64939.0199984 > 5 36563.767 > 6 36641.649 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 5(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 >
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:17 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns Not only +*uid*+ is in groupByColumns, but also *+item_id+* in groupByColumns, just because it's in subQueryJoin. !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator *+uid and item_id area used to compared ITuple+* !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext This method use a counter which limit the result size according to comparator(above), this comparator *should only use* uid in compare method because uid is the only group by column. But comparator *really use* two columns(item_id and uid). !image-2018-12-16-22-58-04-486.png! {quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > Using SortMergedPartitionResultIterator to merge 1 partition results out of > 1 partitions KYLIN [ INFO ] 12-16 23:00:15.089 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> null KYLIN [ INFO ] 12-16 23:00:15.090 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.091 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:06 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-58-04-486.png! {quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > Using SortMergedPartitionResultIterator to merge 1 partition results out of > 1 partitions KYLIN [ INFO ] 12-16 23:00:15.089 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> null KYLIN [ INFO ] 12-16 23:00:15.090 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.091 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=3,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=467,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6332.9699,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> >
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:10 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns Not only +*uid*+ is in groupByColumns, but also *+item_id+* in groupByColumns, just because it's in subQueryJoin. !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator *+uid and item_id area used to compared ITuple+* !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-58-04-486.png! {quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > Using SortMergedPartitionResultIterator to merge 1 partition results out of > 1 partitions KYLIN [ INFO ] 12-16 23:00:15.089 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> null KYLIN [ INFO ] 12-16 23:00:15.090 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.091 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 >
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 3:04 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-58-04-486.png! {quote}KYLIN [ DEBUG ] 12-16 23:00:15.088 org.apache.kylin.storage.gtrecord.SortMergedPartitionResultIterator.hasNext(SortMergedPartitionResultIterator.java:76) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > Using SortMergedPartitionResultIterator to merge 1 partition results out of > 1 partitions KYLIN [ INFO ] 12-16 23:00:15.089 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> null KYLIN [ INFO ] 12-16 23:00:15.090 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=934,_KY_SUM_USERACTION_ACTIVE_MINUTES_=12852.3101,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.091 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=517,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6910.5401,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=1,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=537,_KY_SUM_USERACTION_ACTIVE_MINUTES_=7654.9299,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.092 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=533,_KY_SUM_USERACTION_ACTIVE_MINUTES_=8576.9499,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=2,ACT_TYPE=click,ITEM_ID=202002,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=1786,_KY_SUM_USERACTION_ACTIVE_MINUTES_=25595.3098,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> > UID=2,ACT_TYPE=click,ITEM_ID=22,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=2712,_KY_SUM_USERACTION_ACTIVE_MINUTES_=39149.58,_KY_SUM_USERACTION_PLAY_TIMES_=null, KYLIN [ INFO ] 12-16 23:00:15.093 org.apache.kylin.storage.gtrecord.SortedIteratorMergerWithLimit$MergedIteratorWithLimit.hasNext(SortedIteratorMergerWithLimit.java:112) from Query 9a6c7539-2745-ff2e-946b-1dd178dc5880-51 > UID=3,ACT_TYPE=click,ITEM_ID=10,ACT_DATE=null,ACTIVE_MINUTES=null,PLAY_TIMES=null,_KY_COUNT__=467,_KY_SUM_USERACTION_ACTIVE_MINUTES_=6332.9699,_KY_SUM_USERACTION_PLAY_TIMES_=null, > ==> >
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:58 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-58-04-486.png! That is to say, only 10 distinct row are was (Author: hit_lacus): h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-48-25-622.png! !image-2018-12-16-22-54-25-718.png! That is to say, only 10 distinct row are > Error Limit Push Down in Join Related Query > --- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: all >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: image-2018-12-16-17-06-16-341.png, > image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, > image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, > image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, > image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png, > image-2018-12-16-22-55-23-827.png, image-2018-12-16-22-56-27-258.png, > image-2018-12-16-22-56-50-607.png, image-2018-12-16-22-58-04-486.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > Map 1 .. SUCCEEDED 1 1 0 0 0 0 > Map 3 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 2 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 4 .. SUCCEEDED 1 1 0 0 0 0 > > > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s > > > OK > 1 14565.4708 > 2 64744.893 > 3 64939.0199984 > 5 36563.767 > 6 36641.649 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 5(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > This error is reported by Meituan's Dev. > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 > org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} > KYLIN [ INFO ] 12-16 17:04:28.299 > org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE > {quote} > > {quote}KYLIN [ INFO ] 12-16 17:04:28.405 > org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) > from Query
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:54 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-48-25-622.png! !image-2018-12-16-22-54-25-718.png! That is to say, only 10 distinct row are was (Author: hit_lacus): h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-48-25-622.png! That is to say, only 10 distinct > Error Limit Push Down in Join Related Query > --- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: all >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: image-2018-12-16-17-06-16-341.png, > image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, > image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, > image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, > image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > Map 1 .. SUCCEEDED 1 1 0 0 0 0 > Map 3 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 2 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 4 .. SUCCEEDED 1 1 0 0 0 0 > > > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s > > > OK > 1 14565.4708 > 2 64744.893 > 3 64939.0199984 > 5 36563.767 > 6 36641.649 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 5(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > This error is reported by Meituan's Dev. > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 > org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} > KYLIN [ INFO ] 12-16 17:04:28.299 > org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE > {quote} > > {quote}KYLIN [ INFO ] 12-16 17:04:28.405 > org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > > ==[QUERY]=== > Query Id:
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722492#comment-16722492 ] XiaoXiang Yu edited comment on KYLIN-3722 at 12/16/18 2:49 PM: --- h2. Analysis of cause * In SQLDigest, the column in join subquery will be add to SQLDigest.groupbyColumns !image-2018-12-16-22-40-36-857.png! * In SequentialCubeTupleIterator.getTupleDimensionComparator !image-2018-12-16-22-44-32-875.png! Debugger show: !image-2018-12-16-22-45-29-841.png! !image-2018-12-16-22-46-43-168.png! * In MergedIteratorWithLimit.hasNext !image-2018-12-16-22-48-25-622.png! That is to say, only 10 distinct was (Author: hit_lacus): h2. Analysis of cause todo > Error Limit Push Down in Join Related Query > --- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: all >Reporter: XiaoXiang Yu >Assignee: XiaoXiang Yu >Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: image-2018-12-16-17-06-16-341.png, > image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, > image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, > image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, > image-2018-12-16-22-48-25-622.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > > Map 1 .. SUCCEEDED 1 1 0 0 0 0 > Map 3 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 2 .. SUCCEEDED 1 1 0 0 0 0 > Reducer 4 .. SUCCEEDED 1 1 0 0 0 0 > > > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 7.67 s > > > OK > 1 14565.4708 > 2 64744.893 > 3 64939.0199984 > 5 36563.767 > 6 36641.649 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 5(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > This error is reported by Meituan's Dev. > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 > org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} > KYLIN [ INFO ] 12-16 17:04:28.299 > org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE > {quote} > > {quote}KYLIN [ INFO ] 12-16 17:04:28.405 > org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > > ==[QUERY]=== > Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708 > SQL: select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > User: ADMIN > Success: true > Duration: 0.202 > Project: PearVideo > Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]] > Cuboid Ids: [14] > Total scan count: 120 > Total scan bytes: 6442 > Result row count: 3 > Accept Partial: true > Is Partial Result: false > Hit Exception