Re: UniqueKey constraint is lost with multiple sources join in SQL

2021-04-08 Thread Kai Fu
As identified with the community, it's bug and more information in issue
https://issues.apache.org/jira/browse/FLINK-22113

On Sat, Apr 3, 2021 at 8:43 PM Kai Fu  wrote:

> Hi team,
>
> We have a use case to join multiple data sources to generate a
> continuous updated view. We defined primary key constraint on all the input
> sources and all the keys are the subsets in the join condition. All joins
> are left join.
>
> In our case, the first two inputs can produce *JoinKeyContainsUniqueKey *input
> sepc, which is good and performant. While when it comes to the third input
> source, it's joined with the intermediate output table of the first two
> input tables, and the intermediate table does not carry key constraint
> information(although the thrid source input table does), so it results in a
> *NoUniqueKey* input sepc. Given NoUniqueKey inputs has dramatic
> performance implications per the Force Join Unique Key
> 
> email thread, we want to know if there is any mitigation plan for this.
>
> One solution I can come up with is to write the intermediate result into
> some place like Kafka with unique constraint and join with the
> third source, while it requires extra resources. Any other suggestion on
> this? Thanks.
>
> --
> *Best regards,*
> *- Kai*
>


-- 
*Best wishes,*
*- Kai*


UniqueKey constraint is lost with multiple sources join in SQL

2021-04-03 Thread Kai Fu
Hi team,

We have a use case to join multiple data sources to generate a
continuous updated view. We defined primary key constraint on all the input
sources and all the keys are the subsets in the join condition. All joins
are left join.

In our case, the first two inputs can produce *JoinKeyContainsUniqueKey *input
sepc, which is good and performant. While when it comes to the third input
source, it's joined with the intermediate output table of the first two
input tables, and the intermediate table does not carry key constraint
information(although the thrid source input table does), so it results in a
*NoUniqueKey* input sepc. Given NoUniqueKey inputs has dramatic performance
implications per the Force Join Unique Key

email thread, we want to know if there is any mitigation plan for this.

One solution I can come up with is to write the intermediate result into
some place like Kafka with unique constraint and join with the
third source, while it requires extra resources. Any other suggestion on
this? Thanks.

-- 
*Best regards,*
*- Kai*