Hi list,

I'm trying to figure out if Beam is intended to do the following and, if
so, what's the best approach?

I'm using Java, Beam 2.0.0 on GCP Dataflow. Note: I'm relatively new to
Java, so if there's any known solution for this a code example would be
greatly appreciated.

I have an unbound stream of short messages (coming from PubSub).

For each message, I want to get a number of rows from an external database
(rows within Bigtable, always the same table) based on the contents of the
message, and use the rows when producing output for the final write apply.

I've tried various means of connecting out to Bigtable from within the DoFn
which is handling the PubSub inputs, but so far everything I've tried has
resulted in Beam refusing to run the job due to:

java.io.NotSerializableException:
org.apache.beam.sdk.options.ProxyInvocationHandler

(methods I've tried: manually using a BigtableSession, manually using the
Bigtable HBase libs)

So is this something that Beam was designed to do? If so, what's the
recommended approach?

I considered dynamically constructing a PCollection, but I wasn't sure if
that would make use of connection pooling to Bigtable.

Thanks,
Gwilym

Reply via email to