Hi,

Thank you and congratulations on graduating the Go SDK from experimental 
status- it's all very exciting. We're already successfully using it for several 
of our internal pipelines, but I've now run into an issue writing data using 
the pubsubio.Write transform and I am unable to figure it out.

In the following example I expect data published on the input topic to be 
forwarded to the output topic verbatim. However Dataflow does not output 
anything on the configured topic. 

Examining metrics, data is read successfully but there are no publish requests 
on the output topic. As more data is read from the input topic, throughput 
keeps growing along with data freshness. There are no logged errors or failures 
that I can find.

The SDK version is v2.34.0, but I am seeing the same issue on current master 
(go get: downgraded github.com/apache/beam/sdks/v2 v2.34.0 => 
v2.0.0-20211116020159-9de4bd93fb49).

The example is a simpler version of my actual pipeline that also does 
intermediate data processing and includes multiple stages, but it has the same 
problem.

> package main
> 
> import (
>       "context"
>       "flag"
> 
>       "github.com/apache/beam/sdks/v2/go/pkg/beam"
>       "github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio"
>       "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
>       "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts"
>       _ "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow"
> )
> 
> func main() {
>       var input, output string
>       flag.StringVar(&input, "input", "", "")
>       flag.StringVar(&output, "output", "", "")
>       flag.Parse()
>       beam.Init()
> 
>       ctx := context.Background()
>       project := gcpopts.GetProject(ctx)
> 
>       pipeline, scope := beam.NewPipelineWithRoot()
> 
>       pubsubio.Write(
>               scope,
>               project,
>               output,
>               pubsubio.Read(
>                       scope,
>                       project,
>                       input,
>                       nil,
>               ),
>       )
> 
>       if _, err := beam.Run(context.Background(), "dataflow", pipeline); err 
> != nil {
>               log.Exitf(ctx, "Failed to run job: %v", err)
>       }
> }

Dataflow job 2021-11-16_07_43_39-16986075748673839415 is currently executing 
the above pipeline and I'm just about to see if I can reproduce this with the 
Python SDK. There is no new input being produced, but the pipeline should 
contain previously published data.

Note that I'm aware it is an external transform performed internally by the 
runner and that Dataflow does not support the Go SDK yet, but any insights 
would be helpful.

The answer in https://stackoverflow.com/a/69665998 proposes implementing a 
custom transform that uses the official pubsub package, which seems like a 
reasonable short-term workaround if I am unable to find a solution.


Let me know if I can provide any additional information that would be helpful.


Thanks,
--
Hannes

Reply via email to