Combining PubSub + Bigtable is common.

You should try to use the BigtableSession approach because the hbase
approach adds a lot of dependencies (leading to dependency conflicts).
You should use the same version of Bigtable libraries that Apache Beam is
using (Apache Beam 2.0.0 uses Bigtable 0.9.6.2).

Can you provide the full stack trace for the exception your seeing?

On Wed, May 31, 2017 at 10:51 PM, Gwilym Evans <[email protected]
> wrote:

> Hi list,
>
> I'm trying to figure out if Beam is intended to do the following and, if
> so, what's the best approach?
>
> I'm using Java, Beam 2.0.0 on GCP Dataflow. Note: I'm relatively new to
> Java, so if there's any known solution for this a code example would be
> greatly appreciated.
>
> I have an unbound stream of short messages (coming from PubSub).
>
> For each message, I want to get a number of rows from an external database
> (rows within Bigtable, always the same table) based on the contents of the
> message, and use the rows when producing output for the final write apply.
>
> I've tried various means of connecting out to Bigtable from within
> the DoFn which is handling the PubSub inputs, but so far everything I've
> tried has resulted in Beam refusing to run the job due to:
>
> java.io.NotSerializableException: org.apache.beam.sdk.options.
> ProxyInvocationHandler
>
> (methods I've tried: manually using a BigtableSession, manually using the
> Bigtable HBase libs)
>
> So is this something that Beam was designed to do? If so, what's the
> recommended approach?
>
> I considered dynamically constructing a PCollection, but I wasn't sure if
> that would make use of connection pooling to Bigtable.
>
> Thanks,
> Gwilym
>
>

Reply via email to