Hi list, I'm trying to figure out if Beam is intended to do the following and, if so, what's the best approach?
I'm using Java, Beam 2.0.0 on GCP Dataflow. Note: I'm relatively new to Java, so if there's any known solution for this a code example would be greatly appreciated. I have an unbound stream of short messages (coming from PubSub). For each message, I want to get a number of rows from an external database (rows within Bigtable, always the same table) based on the contents of the message, and use the rows when producing output for the final write apply. I've tried various means of connecting out to Bigtable from within the DoFn which is handling the PubSub inputs, but so far everything I've tried has resulted in Beam refusing to run the job due to: java.io.NotSerializableException: org.apache.beam.sdk.options.ProxyInvocationHandler (methods I've tried: manually using a BigtableSession, manually using the Bigtable HBase libs) So is this something that Beam was designed to do? If so, what's the recommended approach? I considered dynamically constructing a PCollection, but I wasn't sure if that would make use of connection pooling to Bigtable. Thanks, Gwilym
