Re: Querying Dataflow job status via Java SDK

Steve Niemitz Mon, 12 Oct 2020 11:04:59 -0700

We use the Dataflow API [1] directly, via the google api client wrappers
(both python and java), pretty extensively.  It works well and doesn't
require a dependency on beam.


[1] https://cloud.google.com/dataflow/docs/reference/rest

On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik <lc...@google.com> wrote:

> It is your best way to do this right now and this hasn't changed in a
> while (region was added to project and job ids in the past 6 years).
>
> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig <plit...@nianticlabs.com>
> wrote:
>
>> Thanks for the reply, Kyle.
>>
>> The DataflowClient::getJob method uses a Dataflow instance that's
>> provided at construction time (via
>> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
>> be obtained from a minimal instance of the options (i.e., containing only
>> the project ID and region) then it looks like everything should work.
>>
>> I suppose a secondary question here is whether or not this approach is
>> the recommended way to solve my problem (but I don't know of any
>> alternatives).
>>
>> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver <kcwea...@google.com> wrote:
>>
>>> > I think the answer is to use a DataflowClient in the second service,
>>> but creating one requires DataflowPipelineOptions. Are these options
>>> supposed to be exactly the same as those used by the first service? Or do
>>> only some of the fields have to be the same?
>>>
>>> Most options are not necessary for retrieving a job. In general,
>>> Dataflow jobs can always be uniquely identified by the project, region and
>>> job ID.
>>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>>>
>>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig <plit...@nianticlabs.com>
>>> wrote:
>>>
>>>> Hello, Beam users!
>>>>
>>>> Suppose I want to build two (Java) services, one that launches
>>>> (long-running) dataflow jobs, and the other that monitors the status of
>>>> dataflow jobs. Within a single service, I could simply track a
>>>> PipelineResult for each dataflow run and periodically call getState. How
>>>> can I monitor job status like this from a second, independent service?
>>>>
>>>> I think the answer is to use a DataflowClient in the second service,
>>>> but creating one requires DataflowPipelineOptions. Are these options
>>>> supposed to be exactly the same as those used by the first service? Or do
>>>> only some of the fields have to be the same?
>>>>
>>>> Or maybe there's a better alternative than DataflowClient?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Peter
>>>>
>>>

Re: Querying Dataflow job status via Java SDK

Reply via email to