High latency in reading Iceberg tables using Flink table api

Chetas Joshi Tue, 12 Mar 2024 14:58:02 -0700

Hello all,

I am using the flink-iceberg-runtime lib to read an iceberg table into a
Flink datastream. I am using Glue as the catalog. I use the flink table API
to build and query an iceberg table and then use toDataStream to convert it
into a DataStream<Row>. Here is the code


Table table = streamTableEnv.from(<table>).select(..).where(...)
DataStream<Row> stream = streamTableEnv.toDataStream(table)
stream.executeAndCollect()
I have observed that the table construction and the stream
construction (the first two lines of code above) are quite slow. It
takes 6 to 7 seconds. The debugging/profiling exercise has revealed
that there are some inefficiencies. streamTableEnv.toDataStream does
not use the cachingCatalog created and attached to the streamTableEnv
so it hits the external catalog multiple times. toDataStream call
creates a new DummyStreamExecEnv and all the related objects again.
This is where the latency is coming from I think. Has anyone
experienced this? Would appreciate ways to overcome the slowness.

Thank you
Chetas

High latency in reading Iceberg tables using Flink table api

Reply via email to