Hi Beamers, I'm testing out streaming writes to GCS in an attempt to see if Beam will work well for our needs, but I'm having trouble making this work.
I tried having my pipeline run similar to described in https://stackoverflow.com/questions/56960622/writetotext-is-only-writing-to-temp-files p | 'Read PubSub metadata' >> ReadFromPubSub(subscription=known_args.input_subscription) | 'Convert Message to JSON' >> beam.Map(lambda message: json.loads(message)) | 'Extract File Names' >> beam.ParDo(ExtractFn()) | 'Read Files' >> beam.io.ReadAllFromText() | "Window" >> beam.WindowInto(beam.window.FixedWindows(30), trigger=beam.transforms.trigger.AfterProcessingTime(), accumulation_mode=beam.transforms.trigger.AccumulationMode.DISCARDING) | "Combine" >> beam.transforms.core.CombineGlobally(CombineFn).without_defaults() | "Output" >> beam.io.WriteToText(known_args.output) But, just like the Stack Overflow poster, my files never get written to output GCS files. (There are temp files with data, but that looks like it might be all of the input data, not the output data.) There's an answer there saying "From engaging with the Apache Beam Python guys, streaming writes to GCS (or local filesystem) is not yet supported in Python, hence why the streaming write does not occur; only unbounded targets are currently supported (e.g. Big Query tables). Apparently this will be supported in the upcoming release of Beam for Python v2.14.0." I'm on 2.37.0 - was support for streaming file writes ever added? Thanks!
