Hello all,

I am using the flink-iceberg-runtime lib to read an iceberg table into a
Flink datastream. I am using Glue as the catalog. I use the flink table API
to build and query an iceberg table and then use toDataStream to convert it
into a DataStream<Row>. Here is the code

Table table = streamTableEnv.from(<table>).select(..).where(...)
DataStream<Row> stream = streamTableEnv.toDataStream(table)
stream.executeAndCollect()
I have observed that the table construction and the stream
construction (the first two lines of code above) are quite slow. It
takes 6 to 7 seconds. The debugging/profiling exercise has revealed
that there are some inefficiencies. streamTableEnv.toDataStream does
not use the cachingCatalog created and attached to the streamTableEnv
so it hits the external catalog multiple times. toDataStream call
creates a new DummyStreamExecEnv and all the related objects again.
This is where the latency is coming from I think. Has anyone
experienced this? Would appreciate ways to overcome the slowness.

Thank you
Chetas

Reply via email to