It is currently the later where all the data is read and then filtered
within the pipeline. Note that this doesn't mean that all the data is
loaded into memory as the way that the join is done is dependent on the
Runner that is powering the pipeline.

Kenn had shared this doc[1] which is starting to look at integrating
Runners and IO into the SQL shell and attempting to start defining a way to
map properties from SQL onto the IO connector but it seems natural that the
filter would get pushed down to the IO connector as well. Please take a
look and feel free to comment.

1:
https://docs.google.com/document/d/1ZFVlnldrIYhUgOfxIT2JcmTFFSWTl4HwAnQsnwiNL1g/edit#heading=h.4zubkdp87wok

On Wed, Jun 13, 2018 at 7:39 AM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:

> Hi,
>
> We are currently playing with Apache Beam’s SQL extension on top of Flink.
> One of the features that we were interested is the SQL Predicate Pushdown
> feature that Spark provides. Does Beam support that?
>
> For eg:
> I have an unbounded dataset that I want to join with some static reference
> data stored in a database. Will beam perform the logic of figuring out all
> the unique keys in the window and push it down to the jdbc source or will
> it bring all the data from the jdbc source into memory and then perform the
> join?
>
> Thanks,
> Harsh
> --
> Regards,
> Harshvardhan
>

Reply via email to