Hi Chris,

Dataflow does not support GPUs at the moment, but this feature is on our
radar and we are considering it for future prioritization. Dataflow-on-GKE
is also not supported.

Currently Dataflow worker pool is homogenous. However, in the future,
resource annotations in pipeline should be a way to go. As you noted,
resource annotation support needs to happen in Beam SDK. This feature is
not tied to a particular functionality (GPUs) or a particular runner
(Dataflow), and can be implemented in Beam codebase.

At the moment, you can try experimenting with Direct runner on a single
machine with a GPU, or try portable runners that use a stand-alone
infrastructure for example, Beam Flink runner +  Flink on Dataproc cluster
with GPUs.

Thanks,
Valentyn

On Tue, Oct 1, 2019 at 11:24 AM Chris Roat <[email protected]> wrote:

> While evaluating many tools for a project, I found Beam suits my needs
> quite well from the abstraction point of view.  Both the dead-simple way to
> scale up (and even down to single-machine for testing) and the ease of
> moving between different runners are key.  Plus, I'm familiar with the
> framework from having used Flume while at Google.
>
> One thing I'd find useful in the implementation are resource hints[1],
> particularly to use GPUs for several parts of the processing.  Forgoing
> hints and the ability to run easily on GPUs, I'd be happy to break up my
> pipeline, and just spin up all my machines with GPUs for the sub-pipelines
> that need it.
>
> Some paths I'm considering:
> - Find the easiest way to go from start-cluster-with-cpus (i.e. gcloud
> container clusters ... --accelerator=...) to run-dataflow-on-said-cluster.
> What would that be?
> - Implement --accelerator in PipelineOptions and implement for Dataflow
>
> Thanks for any advice,
> Chris
>
> [1] https://issues.apache.org/jira/browse/BEAM-2085
>

Reply via email to