Calcite/BeamSql

Yushu Yao Mon, 10 Jan 2022 12:15:51 -0800

Hi Folks,

Question from a Newbie for both Calcite and Beam:


I understand Calcite can make a tree of execution plan with relational
algebra and push certain operations to a "data source". And at the same
time, it can allow source-specific optimizations.

I also understand that Beam SQL can run SqlTransform.query() on one or more
of the PCollection<Row>, and Calcite is used in coming up with the
execution plan.

My question is, assume I have a MySql Table as Table1, and a Kafka Stream
called "Kafka".

Now I want to do some joins like lookuping up a row based on a key in the
Kafka message:
select Table1.*, Kafka.* from Kafka join Table1 on Table1.key=Kafka.key

What's the best way to implement this with beamSQL. (Note that we can't
hardcode the join because each input Kafka message may need a different
SQL).

One step further, if we have 2 MySql Tables, Table1, and Table2. And a
Kafka Stream "Kafka". And we want to join those 2 tables inside MySql first
(and maybe with aggregations like sum/count), then join with the Kafka. Is
there a way to tap into calcite so that the join of the 2 tables are
actually pushed into MySql?

Sorry for the lengthy question and please let me know if more
clarifications is needed.

Thanks a lot in advanced!

-Yushu

Calcite/BeamSql

Reply via email to