Thank you Yuval!  Will look into it first thing Monday morning, much
appreciated.  In the case where we aren't able to filter by anything else,
is there anything else we can could potentially look into to help?


Thank you


Jason Politis
Solutions Architect, Carrera Group
carrera.io
| jpoli...@carrera.io <kpatter...@carrera.io>
<http://us.linkedin.com/in/jasonpolitis>


On Fri, May 20, 2022 at 11:46 PM Yuval Itzchakov <yuva...@gmail.com> wrote:

> Hi Jason,
>
> When using interval joins, Flink can't parallelize the execution as the
> join key (semantically) is even time, thus all events must fall into the
> same partition for Flink to be able to lookup events from the two streams.
> See the IntervalJoinOperator (
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/co/IntervalJoinOperator.java)
> for the details.
>
> If you have any other key(s) that could join both streams, and then filter
> by the time of the event at a later phase, that could speed things up.
>
> On Fri, May 20, 2022, 23:53 Jason Politis <jpoli...@carrera.io> wrote:
>
>> Good evening all,
>>
>> We are working on a project where a few queries that are joining based on
>> dates from table A are between dates from table B.  Something like:
>>
>> SELECT
>> A.ID,
>> B.NAME
>> FROM
>> A,
>> B
>> WHERE
>> A.DATE BETWEEN B.START_DATE AND B.END_DATE;
>>
>> Both A and B are topics in Kafka with 5 partitions.  Doing a simple test
>> of selecting * on each one will yield tasks with a parallelism of 5, which
>> tells me we have parallelism working in flink.
>>
>> BUT, when we attempt the query I pasted above, the BETWEEN clause doesn't
>> parallelize.
>>
>> I'd like to get your expert opinion on this and get your help on how to
>> force this to parallelize.
>>
>> Thank you
>>
>>
>> Jason Politis
>> Solutions Architect, Carrera Group
>> carrera.io
>> | jpoli...@carrera.io <kpatter...@carrera.io>
>> <http://us.linkedin.com/in/jasonpolitis>
>>
>

Reply via email to