Hi Fabian,
So maybe I can not join a table that generate from a window, because
the table is getting larger and larger as the time goes, maybe the system will
crash one day.
I am working on a system that calculate the “score" of article, which
is consist of the count of article praise, the count of article response, etc
Because I can not use flink to save all the article, I decide to update
the score of the article that created in 3 days.
I have two choises,
1. join the article table and praise table, response table then window
select a.article_id, count(p.praise_id) as pCount,
count(r.response_id) as rCount
from
article a
left join
praise p on a.article_id = p.article_id
left join
response r on a.article_id = r.article_id
group by hop(updated_time, interval '1' minute,interval '3'
day) , article_id
2. window the article table, window the priase table, window the
response table ,then join them together
select aAggr.article_id, pAggr.pCount, rAggr.rCount
(select article_id from article group by hop(updated_time,
interval '1' minute,interval '3' day) , article_id) aAggr
left join
(select article_id,count(praise_id) as pCount from praise group
by hop(updated_time, interval '1' minute,interval '3' day) , article_id) pAggr
on aAggr.article_id=pAggr.article_id
left join
(select article_id,count(response_id) as rCount from response
group by hop(updated_time, interval '1' minute,interval '3' day) , article_id)
rAggr on aAggr.article_id=rAggr.article_id
Maybe I should choose 1, join then window, but not window then join.
Please correct me if I am wrong.
I have some worries when choose 1,
I do not know how Flink works internally, it seems that in the sql ,
table article ,table praise, table response is growing as the time goes by,
will it introduce performance issue?
Best,
Henry
> 在 2018年8月21日,下午9:29,Hequn Cheng <[email protected]> 写道:
>
> Hi Henry,
>
> praiseAggr is an append table, so it contains "100,101,102,100,101,103,100".
> 1. if you change your sql to s"SELECT article_id FROM praise GROUP BY
> article_id", the answer is "101,102,103"
> 2. if you change your sql to s"SELECT last_value(article_id) FROM praise",
> the answer is "100"
>
> Best, Hequn
>
> On Tue, Aug 21, 2018 at 8:52 PM, 徐涛 <[email protected]
> <mailto:[email protected]>> wrote:
> Hi Fabian,
> Thanks for your response. This question puzzles me for quite a long
> time.
> If the praiseAggr has the following value:
> window-1 100,101,102
> window-2 100,101,103
> window-3 100
>
> the last time the article table joins praiseAggr, which of the
> following value does praiseAggr table has?
> 1— 100,101,102,100,101,103,100 collect all the element
> of all the window
> 2— 100 the element
> of the latest window
> 3— 101,102,103 the distinct value
> of all the window
>
>
> Best,
> Henry
>
>
>> 在 2018年8月21日,下午8:02,Fabian Hueske <[email protected]
>> <mailto:[email protected]>> 写道:
>>
>> Hi,
>>
>> The semantics of a query do not depend on the way that it is used.
>> praiseAggr is a table that grows by one row per second and article_id. If
>> you use that table in a join, the join will fully materialize the table.
>> This is a special case because the same row is added multiple times, so the
>> state won't grow that quickly, but the performance will decrease because for
>> each row from article will join with multiple (a growing number) of rows
>> from praiseAggr.
>>
>> Best, Fabian
>>
>> 2018-08-21 12:19 GMT+02:00 徐涛 <[email protected]
>> <mailto:[email protected]>>:
>> Hi All,
>> var praiseAggr = tableEnv.sqlQuery(s"SELECT article_id FROM praise
>> GROUP BY HOP(updated_time, INTERVAL '1' SECOND,INTERVAL '3' MINUTE) ,
>> article_id" )
>> tableEnv.registerTable("praiseAggr", praiseAggr)
>> var finalTable = tableEnv.sqlQuery(s”SELECT 1 FROM article a join
>> praiseAggr p on a.article_id=p.article_id" )
>> tableEnv.registerTable("finalTable", finalTable)
>> I know that praiseAggr, if written to sink, is append mode , so if a
>> table joins praiseAggr, what the table “see”, is a table contains the latest
>> value, or a table that grows larger and larger? If it is the later, will it
>> introduce performance problem?
>> Thanks a lot.
>>
>>
>> Best,
>> Henry
>>
>
>