Thanks.  I will create a JIRA ticket to try to explain. I am planning a
service running in kubernetes that will submit dataflow jobs.  It will need
to know the status of jobs [across service restarts]. Alternatives might be
to do some sort of GBK at the end of the job and post the result to
pub/sub.  That seemed complex - my last step is currently a
Datastore.write, which needed to be finished before claiming the job is
done, and DataStoreIO is a "termination" right?



On Sun, Jul 9, 2017 at 10:04 PM Kenneth Knowles <k...@google.com> wrote:

> (Speaking for Java, but I think Python is similar)
>
> There's nothing in the Beam API right now for querying a job unless you
> have a handle on the original object returned by the runner. The nature of
> the result of run() is particular to a runner, though it is easy to imagine
> a feature whereby you can "attach" to a known running job.
>
> So I think your best option is to use runner-specific APIs for now. For
> Dataflow that would be the cloud APIs [1]. You can see how it is done by
> the Beam wrapper DataflowPipelineJob [2] as a reference.
>
> Out of curiosity - what sort of third-party app? It would super if you
> could file a JIRA [3] describing your use case with some more details, to
> help gain visibility.
>
> Kenn
>
> [1]
> https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/get
> [2]
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441
> [3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore <rdmoor...@gmail.com> wrote:
>
>> Is this part of the Beam API or something I should look at the google
>> docs for help?  Assume a job is running in dataflow - how can an interested
>> third-party app query the status if it knows the job-id?
>>
>> rdm
>>
>
>

Reply via email to