Maybe it is a kind of the threshold query. You can google it for much info.
?? 2020/5/18 ????3:51, ???? ????:
Shaofeng Shi ????????
?0?2 ?0?2 ????????????????????????
????????????????????????????????????????????????????????????????????????father????????????????????key??????city??????amt??????????a??????????b??????????c????????????1????????????????????????????child??????????????????????
?0?2 ?0?2 1?? ????????????????????????????10%??select count(1) from child /
select count(1) from father ?? 10%????
?0?2 ?0?2 2?? ????????????????????????????????????????????????????????5%??select
sum(province(city)) from child group by province(city) / select
sum(province(city)) from father group by province(city) ??
5%??????province????????????????????province??????????province(city)??????city????????province??udf????????????????????????????????????????1%??select
sum(city) from child group by city / select sum(city) from father
group by city??????city????????????????????city????????????
?0?2 ?0?2 3??
????????????????????????????100????10%??????????????????90??~110????????90??
?? select sum(amt) from child ?? 110??????
?0?2 ?0?2 4?? ????????????????????????20%??select a / (a+b+c) from child?0?2 ?? 20%,
select b / (a+b+c) from child?0?2 ?? 20%, select c / (a+b+c) from child?0?2 ??
20%????????????????????????????????????????????????????????????a??b??c????????????
?0?2 ?0?2
????????????????????????????????????????????????????????????????????????????????????????????????Apache
Kylin????????????????????????????????????????????????????????????????????????????????????????????????
------------------?0?2?????????0?2------------------
*??????:*?0?2"ShaoFeng Shi"<[email protected]>;
*????????:*?0?22020??5??16??(??????) ????11:24
*??????:*?0?2"user"<[email protected]>;
*????:*?0?2Re: ????????
Hi Xiang,
I'm not sure whether Kylin can help; Does Hive/Spark SQL can fullfill
the requirement? If you can provide a couple of SQL queries, that
would help us to see whether Kylin can help.
Best regards,
Shaofeng Shi ??????
Apache Kylin PMC
Email: [email protected] <mailto:[email protected]>
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
<mailto:[email protected]>
Join Kylin dev mail group: [email protected]
<mailto:[email protected]>
???? <[email protected] <mailto:[email protected]>> ??2020??5??15??????
????1:18??????
????????
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????15%??????????15%??????????????????????????????????????????????????????????????????????????????????????????Apache
Kylin??????????????????????????????????????????????????????????????????????????????????????
Hello??everyone??
Now we have a business requirement, which is to filter out sub
datasets from a large number of data that can meet multiple rules
at the same time. In different scenarios, there will be different
and complex rules. For example, the proportion of a single city in
the data source cannot exceed 15% (of course, 15% can be adjusted
on demand by users), the proportion of various calculated business
values does not exceed a specific value, and so on. I want to
know, can we resolve this requirement by Apache Kylin? What plan
should be adopted if possible? Is there any information or demo
for reference? Does it need to be done with other tools?Thanks a lot.