Hi Shammon
Are you suggesting that I use over and partition by , right? if it is like
this, I must define a agg_func on a specific column.
For Example,I have a product table.
Before partition by :
select user,product,amount
FROM product
After partition by :
select user,product,amount,
Hi hjw
To rescale data for dim join, I think you can use `partition by` in sql
before `dim join` which will redistribute data by specific column. In
addition, you can add cache for `dim table` to improve performance too.
Best,
Shammon FY
On Tue, Apr 4, 2023 at 10:28 AM Hang Ruan wrote:
> Hi,
Hi, hiw,
IMO, I think the parallelism 1 is enough for you job if we do not consider
the sink. I do not know why you need set the lookup join operator's
parallelism to 6.
The SQL planner will help us to decide the type of the edge and we can not
change it.
Maybe you could share the Execution graph
For example. I create a kafka source to subscribe the topic that have one
partition and set the default parallelism of the job to 6.The next operator of
kafka source is that lookup join a mysql table.However, the relationship
between the kafka Source and the Lookup join operator is Forward,