We use the Dataflow API [1] directly, via the google api client wrappers (both python and java), pretty extensively. It works well and doesn't require a dependency on beam.
[1] https://cloud.google.com/dataflow/docs/reference/rest On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik <lc...@google.com> wrote: > It is your best way to do this right now and this hasn't changed in a > while (region was added to project and job ids in the past 6 years). > > On Mon, Oct 12, 2020 at 10:53 AM Peter Littig <plit...@nianticlabs.com> > wrote: > >> Thanks for the reply, Kyle. >> >> The DataflowClient::getJob method uses a Dataflow instance that's >> provided at construction time (via >> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can >> be obtained from a minimal instance of the options (i.e., containing only >> the project ID and region) then it looks like everything should work. >> >> I suppose a secondary question here is whether or not this approach is >> the recommended way to solve my problem (but I don't know of any >> alternatives). >> >> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver <kcwea...@google.com> wrote: >> >>> > I think the answer is to use a DataflowClient in the second service, >>> but creating one requires DataflowPipelineOptions. Are these options >>> supposed to be exactly the same as those used by the first service? Or do >>> only some of the fields have to be the same? >>> >>> Most options are not necessary for retrieving a job. In general, >>> Dataflow jobs can always be uniquely identified by the project, region and >>> job ID. >>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100 >>> >>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig <plit...@nianticlabs.com> >>> wrote: >>> >>>> Hello, Beam users! >>>> >>>> Suppose I want to build two (Java) services, one that launches >>>> (long-running) dataflow jobs, and the other that monitors the status of >>>> dataflow jobs. Within a single service, I could simply track a >>>> PipelineResult for each dataflow run and periodically call getState. How >>>> can I monitor job status like this from a second, independent service? >>>> >>>> I think the answer is to use a DataflowClient in the second service, >>>> but creating one requires DataflowPipelineOptions. Are these options >>>> supposed to be exactly the same as those used by the first service? Or do >>>> only some of the fields have to be the same? >>>> >>>> Or maybe there's a better alternative than DataflowClient? >>>> >>>> Thanks in advance! >>>> >>>> Peter >>>> >>>