Re: Re: Scan row count exceeded threshold

张天生 Tue, 02 Aug 2016 19:27:05 -0700

Hi ShaoFeng:

My sql is "select "DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID,SUM(IMPS) as
imps,SUM(CLKS) as clks,SUM(CONS) as cons,
(SUM(IMP_CLOSINGPRICE)+SUM(CLK_CLOSINGPRICE)) as cost,COUNT(DISTINCT
CLK_DEVICEID) as clk_uv from EXT_MID_EVENT_JOIN where COMPANYID='296' and
"DATE">='2016-01-01' and "DATE"<'2016-01-05' group by
"DATE",ADGROUPID,CAMPAIGNID,COMPANYID,APPID order by imps desc limit 10
offset 0". I analyzed the critical reason is "COUNT(DISTINCT CLK_DEVICEID)
as clk_uv", when i delete this measure, it queried fine even for a long
time span query.


赵天烁 <[email protected]>于2016年8月3日周三 上午10:16写道：

> ok,I'll try to make it configurable in kylin.properties
>
> ------------------------------
>
> 赵天烁
>
> Kevin Zhao
>
> *[email protected] <[email protected]>*
>
>
>
> 珠海市魅族科技有限公司
>
> MEIZU Technology Co., Ltd.
>
> 广东省珠海市科技创新海岸魅族科技楼
>
> MEIZU Tech Bldg., Technology & Innovation Coast
>
> Zhuhai, 519085, Guangdong, China
>
> meizu.com
>
>
> *From:* ShaoFeng Shi <[email protected]>
> *Date:* 2016-08-02 22:37
> *To:* user <[email protected]>
> *Subject:* Re: Re: Scan row count exceeded threshold
>
> This is a new change introduced in 1.5.3, JIRA is:
> https://issues.apache.org/jira/browse/KYLIN-1787
>
> In a short, it tries to avoid scanning too many rows (1 million) in one
> query. But the threshold wasn't externalized as a parameter. @Tianshuo,
> would you mind to contribute a patch?
>
> Besides, the query has "limit 1" but Kylin didn't smartly aware that.
>
> @Tian sheng, what's the SQL query looks like in your case?
>
> 2016-08-02 21:14 GMT+08:00 张天生 <[email protected]>:
>
>> I meet the same question. In 1.5.2.1 version it worked fine, but switched
>> 1.5.3 it ofter report this error.
>>
>> 赵天烁 <[email protected]>于2016年8月2日周二 上午11:57写道：
>>
>>> yes, it is 1.5.3,after limit the date range to a week,it still cost
>>> about 7 second to response, is there any way to get the query execute
>>> detail just like the diagnosis log?
>>>
>>> ------------------------------
>>>
>>> 赵天烁
>>>
>>> Kevin Zhao
>>>
>>> *[email protected] <[email protected]>*
>>>
>>>
>>>
>>> 珠海市魅族科技有限公司
>>>
>>> MEIZU Technology Co., Ltd.
>>>
>>> 广东省珠海市科技创新海岸魅族科技楼
>>>
>>> MEIZU Tech Bldg., Technology & Innovation Coast
>>>
>>> Zhuhai, 519085, Guangdong, China
>>>
>>> meizu.com
>>>
>>>
>>> *From:* ShaoFeng Shi <[email protected]>
>>> *Date:* 2016-08-01 20:20
>>> *To:* user <[email protected]>
>>> *Subject:* Re: Scan row count exceeded threshold
>>>
>>> are you on 1.5.3? If not, suggest to upgrade and then retry it. Besides,
>>> you can add some "where" contition to reduce the records to scan, as the
>>> error message mentioned.
>>>
>>> 2016-08-01 15:37 GMT+08:00 赵天烁 <[email protected]>:
>>>
>>>> recently I got the following error while execute query on a cube which
>>>> is not that big( about 400mb, 20milion record)
>>>> ==================
>>>> Error while executing SQL "select FCRASHTIME,count(1) from
>>>> UXIP.EDL_FDT_OUC_UPLOAD_FILES group by FCRASH_ANALYSIS_ID,FCRASHTIME limit
>>>> 1": Scan row count exceeded threshold: 10000000, please add filter
>>>> condition to narrow down backend scan range, like where clause.
>>>>
>>>> I guess what  it scan were the intermediate result, but It doesn't any
>>>> order by,also the result count is limit to just 1.so it could scan to
>>>> find any record with those two dimension and wala.
>>>> waiting kylin expert to give me some detail on that.
>>>> btw how to configure that threshold, any properties in kylin.properties?
>>>> ------------------------------
>>>>
>>>> 赵天烁
>>>>
>>>> Kevin Zhao
>>>>
>>>> *[email protected] <[email protected]>*
>>>>
>>>>
>>>>
>>>> 珠海市魅族科技有限公司
>>>>
>>>> MEIZU Technology Co., Ltd.
>>>>
>>>> 广东省珠海市科技创新海岸魅族科技楼
>>>>
>>>> MEIZU Tech Bldg., Technology & Innovation Coast
>>>>
>>>> Zhuhai, 519085, Guangdong, China
>>>>
>>>> meizu.com
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi
>>>
>>>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>

Re: Re: Scan row count exceeded threshold

Reply via email to