This is pretty surprising. Seems valuable to file separate Jiras so we can
track investigation and resolution.

 - use gRPC: https://issues.apache.org/jira/browse/BEAM-7718
 - empty message bodies: https://issues.apache.org/jira/browse/BEAM-7716
 - watermark tracking: https://issues.apache.org/jira/browse/BEAM-7717

You reproduced these with the original PubsubIO?

Kenn

On Mon, Jul 8, 2019 at 10:38 AM Steve Niemitz <[email protected]> wrote:

> I was trying to use the bundled PubsubIO.Read implementation in beam on
> dataflow (using --experiments=enable_custom_pubsub_source to prevent
> dataflow from overriding it with its own implementation) and ran into some
> interesting issues.  I was curious if people have any experience with
> these.  I'd assume anyone using PubsubIO on a runner other than dataflow
> would have run into the same things.
>
> - The default implementation uses the HTTP REST API, which seems to be
> much less performant than the gRPC implementation.  Is there a reason that
> the gRPC implementation is essentially unavailable from the public API?
> PubsubIO.Read.withClientFactory is package private.  I worked around this
> by making it public and rebuilding, which led me to...
>
> - Both the JSON and gRPC implementation return empty message bodies for
> all messages read (using readMessages).  When running with the
> dataflow-specific reader, this doesn't happen and the message bodies have
> the content as expected.  I took a pipeline that works as expected on
> dataflow using PubsubIO.Read, added the experiment flag, and then my
> pipeline broke from empty message bodies.  This obviously blocked me from
> really experimenting much more.
>
> - The watermark tracking seems off.  The dataflow UI was reporting my
> watermark as around (but not exactly) the epoch (it was ~1970-01-19), which
> makes me wonder if seconds/milliseconds got confused somewhere (ie, if you
> take the time since epoch in milliseconds now and interpret it as seconds,
> you'll get somewhere around 1970-01-18).
>

Reply via email to