Thanks. I will create a JIRA ticket to try to explain. I am planning a service running in kubernetes that will submit dataflow jobs. It will need to know the status of jobs [across service restarts]. Alternatives might be to do some sort of GBK at the end of the job and post the result to pub/sub. That seemed complex - my last step is currently a Datastore.write, which needed to be finished before claiming the job is done, and DataStoreIO is a "termination" right?
On Sun, Jul 9, 2017 at 10:04 PM Kenneth Knowles <k...@google.com> wrote: > (Speaking for Java, but I think Python is similar) > > There's nothing in the Beam API right now for querying a job unless you > have a handle on the original object returned by the runner. The nature of > the result of run() is particular to a runner, though it is easy to imagine > a feature whereby you can "attach" to a known running job. > > So I think your best option is to use runner-specific APIs for now. For > Dataflow that would be the cloud APIs [1]. You can see how it is done by > the Beam wrapper DataflowPipelineJob [2] as a reference. > > Out of curiosity - what sort of third-party app? It would super if you > could file a JIRA [3] describing your use case with some more details, to > help gain visibility. > > Kenn > > [1] > https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/get > [2] > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441 > [3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa > > On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore <rdmoor...@gmail.com> wrote: > >> Is this part of the Beam API or something I should look at the google >> docs for help? Assume a job is running in dataflow - how can an interested >> third-party app query the status if it knows the job-id? >> >> rdm >> > >