"two of then cardinality are bigger than 100milion" ==> You must do optimization for your cube when having UHC dimensions; otherwise its expansion rate may be very high.
2016-08-06 12:20 GMT+08:00 张天生 <[email protected]>: > I also need to adjust threshold to bigger value. We set JVM -Xmx to 32G, > but it was not occupied enough. > > 赵天烁 <[email protected]>于2016年8月4日周四 上午11:54写道: > >> I dig into the code, found that this threshold is determine at >> CubeStorageQuery--setThreshold method, there already has a prop call >> kylin.query.mem.budget,which define the memory size of current jvm >> obtain,default to 3G then divide this by a estimate size of a single >> row.the result of that is the threshold.so I think by change >> kylin.query.mem.budget >> to a higher val will do. >> >> btw my cube is growing like a hell.....the biggest one is almost >> 1.7TB(only keep on month of these data),source record is 200 milion level,9 >> dim,two of then cardinality are bigger than 100milion,already have >> performance issue,maybe you kylin expert could provide some optimize sugg? >> waiting...... >> >> ------------------------------ >> >> 赵天烁 >> >> Kevin Zhao >> >> *[email protected] <[email protected]>* >> >> >> >> 珠海市魅族科技有限公司 >> >> MEIZU Technology Co., Ltd. >> >> 广东省珠海市科技创新海岸魅族科技楼 >> >> MEIZU Tech Bldg., Technology & Innovation Coast >> >> Zhuhai, 519085, Guangdong, China >> >> meizu.com >> >> >> *From:* hongbin ma <[email protected]> >> *Date:* 2016-08-03 12:50 >> *To:* user.kylin <[email protected]> >> >> *Subject:* Re: Re: Scan row count exceeded threshold >> >> Hi Tiansheng and Tianshuo >> >> Kylin is basically limiting the footprint on storage visit, the size of >> each row is inversely proportional to the number of rows to be read. The >> cuboid row size grows when there's a distinct count measure, so you're >> observing threshold being 49121, this is normal. >> >> The unnormal part is kylins behavior when there's a limit clause, >> especially for cases like Tianshuo's case, where query being: >> >> select FCRASHTIME,count(1) from UXIP.EDL_FDT_OUC_UPLOAD_FILES group by >> FCRASH_ANALYSIS_ID,FCRASHTIME limit N >> >> The query does not have any filters, so we should be able to read the >> first N rows from cuboid (FCRASH_ANALYSIS_ID,FCRASHTIME) and return the >> result to users. Yang Li tried to fix the issue in >> https://issues.apache.org/jira/browse/KYLIN-1787, however the approach >> was still a little bit too conservative to me. The patch in KYLIN-1787 >> would not enable the storage read limit as long as the cube has a partition >> time column (and meanwhile the query is not grouping by the partition time >> column), because we'll need to further aggregate rows from different >> segments. This is why 1.5.3 does not behave as Tianshuo expect. >> >> However there's still room for improvement even if further aggregation is >> required across multiple segments. For tianshuo's case, we can ask for N >> cuboid row from each segment, and merge them at query server side. Since >> the cuboid rows are respectively sorted in each segment, it >> is guaranteed that the result is correct >> >> However it's a different story if the query contains filters, like in >> Tiansheng's case. Filter on dimensions may prevent limit clause put down, >> especially when the dimension is not the first dimension in row key. Below >> is Tiansheng's case: >> >> Error while executing SQL "select >> "DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID,SUM(IMPS) >> as imps,SUM(CLKS) as clks,SUM(CONS) as cons, >> (SUM(IMP_CLOSINGPRICE)+SUM(CLK_CLOSINGPRICE)) >> as cost,COUNT(DISTINCT CLK_DEVICEID) as clk_uv from EXT_MID_EVENT_JOIN >> where COMPANYID='296' and "DATE">='2016-01-01' and "DATE"<'2016-01-05' >> group by "DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID order by imps desc >> limit 10 offset 0": Scan row count exceeded threshold: 49121, please add >> filter condition to narrow down backend scan range, like where clause. >> >> I have opened a JIRA to fix this >> >> On Wed, Aug 3, 2016 at 10:25 AM, 张天生 <[email protected]> wrote: >> >>> Hi ShaoFeng: >>> >>> My sql is "select "DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID,SUM(IMPS) >>> as imps,SUM(CLKS) as clks,SUM(CONS) as cons, >>> (SUM(IMP_CLOSINGPRICE)+SUM(CLK_CLOSINGPRICE)) >>> as cost,COUNT(DISTINCT CLK_DEVICEID) as clk_uv from EXT_MID_EVENT_JOIN >>> where COMPANYID='296' and "DATE">='2016-01-01' and "DATE"<'2016-01-05' >>> group by "DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID order by imps desc >>> limit 10 offset 0". I analyzed the critical reason is "COUNT(DISTINCT >>> CLK_DEVICEID) as clk_uv", when i delete this measure, it queried fine even >>> for a long time span query. >>> >>> 赵天烁 <[email protected]>于2016年8月3日周三 上午10:16写道: >>> >>>> ok,I'll try to make it configurable in kylin.properties >>>> >>>> ------------------------------ >>>> >>>> 赵天烁 >>>> >>>> Kevin Zhao >>>> >>>> *[email protected] <[email protected]>* >>>> >>>> >>>> >>>> 珠海市魅族科技有限公司 >>>> >>>> MEIZU Technology Co., Ltd. >>>> >>>> 广东省珠海市科技创新海岸魅族科技楼 >>>> >>>> MEIZU Tech Bldg., Technology & Innovation Coast >>>> >>>> Zhuhai, 519085, Guangdong, China >>>> >>>> meizu.com >>>> >>>> >>>> *From:* ShaoFeng Shi <[email protected]> >>>> *Date:* 2016-08-02 22:37 >>>> *To:* user <[email protected]> >>>> *Subject:* Re: Re: Scan row count exceeded threshold >>>> >>>> This is a new change introduced in 1.5.3, JIRA is: >>>> https://issues.apache.org/jira/browse/KYLIN-1787 >>>> >>>> In a short, it tries to avoid scanning too many rows (1 million) in one >>>> query. But the threshold wasn't externalized as a parameter. @Tianshuo, >>>> would you mind to contribute a patch? >>>> >>>> Besides, the query has "limit 1" but Kylin didn't smartly aware that. >>>> >>>> @Tian sheng, what's the SQL query looks like in your case? >>>> >>>> 2016-08-02 21:14 GMT+08:00 张天生 <[email protected]>: >>>> >>>>> I meet the same question. In 1.5.2.1 version it worked fine, but >>>>> switched 1.5.3 it ofter report this error. >>>>> >>>>> 赵天烁 <[email protected]>于2016年8月2日周二 上午11:57写道: >>>>> >>>>>> yes, it is 1.5.3,after limit the date range to a week,it still cost >>>>>> about 7 second to response, is there any way to get the query execute >>>>>> detail just like the diagnosis log? >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> 赵天烁 >>>>>> >>>>>> Kevin Zhao >>>>>> >>>>>> *[email protected] <[email protected]>* >>>>>> >>>>>> >>>>>> >>>>>> 珠海市魅族科技有限公司 >>>>>> >>>>>> MEIZU Technology Co., Ltd. >>>>>> >>>>>> 广东省珠海市科技创新海岸魅族科技楼 >>>>>> >>>>>> MEIZU Tech Bldg., Technology & Innovation Coast >>>>>> >>>>>> Zhuhai, 519085, Guangdong, China >>>>>> >>>>>> meizu.com >>>>>> >>>>>> >>>>>> *From:* ShaoFeng Shi <[email protected]> >>>>>> *Date:* 2016-08-01 20:20 >>>>>> *To:* user <[email protected]> >>>>>> *Subject:* Re: Scan row count exceeded threshold >>>>>> >>>>>> are you on 1.5.3? If not, suggest to upgrade and then retry it. >>>>>> Besides, you can add some "where" contition to reduce the records to >>>>>> scan, >>>>>> as the error message mentioned. >>>>>> >>>>>> 2016-08-01 15:37 GMT+08:00 赵天烁 <[email protected]>: >>>>>> >>>>>>> recently I got the following error while execute query on a cube >>>>>>> which is not that big( about 400mb, 20milion record) >>>>>>> ================== >>>>>>> Error while executing SQL "select FCRASHTIME,count(1) from >>>>>>> UXIP.EDL_FDT_OUC_UPLOAD_FILES group by FCRASH_ANALYSIS_ID,FCRASHTIME >>>>>>> limit >>>>>>> 1": Scan row count exceeded threshold: 10000000, please add filter >>>>>>> condition to narrow down backend scan range, like where clause. >>>>>>> >>>>>>> I guess what it scan were the intermediate result, but It doesn't >>>>>>> any order by,also the result count is limit to just 1.so it could >>>>>>> scan to find any record with those two dimension and wala. >>>>>>> waiting kylin expert to give me some detail on that. >>>>>>> btw how to configure that threshold, any properties in >>>>>>> kylin.properties? >>>>>>> ------------------------------ >>>>>>> >>>>>>> 赵天烁 >>>>>>> >>>>>>> Kevin Zhao >>>>>>> >>>>>>> *[email protected] <[email protected]>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> 珠海市魅族科技有限公司 >>>>>>> >>>>>>> MEIZU Technology Co., Ltd. >>>>>>> >>>>>>> 广东省珠海市科技创新海岸魅族科技楼 >>>>>>> >>>>>>> MEIZU Tech Bldg., Technology & Innovation Coast >>>>>>> >>>>>>> Zhuhai, 519085, Guangdong, China >>>>>>> >>>>>>> meizu.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> >>>>>> Shaofeng Shi >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi >>>> >>>> >> >> >> -- >> Regards, >> >> *Bin Mahone | 马洪宾* >> >> -- Best regards, Shaofeng Shi
