Thanks for the explanation, as well as the tip on Dataproc+Flink. I'll give that a whirl. Cheers, Chris
On Wed, Oct 2, 2019 at 11:10 AM Valentyn Tymofieiev <[email protected]> wrote: > > Hi Chris, > > Dataflow does not support GPUs at the moment, but this feature is on our > radar and we are considering it for future prioritization. Dataflow-on-GKE > is also not supported. > > Currently Dataflow worker pool is homogenous. However, in the future, > resource annotations in pipeline should be a way to go. As you noted, > resource annotation support needs to happen in Beam SDK. This feature is > not tied to a particular functionality (GPUs) or a particular runner > (Dataflow), and can be implemented in Beam codebase. > > At the moment, you can try experimenting with Direct runner on a single > machine with a GPU, or try portable runners that use a stand-alone > infrastructure for example, Beam Flink runner + Flink on Dataproc cluster > with GPUs. > > Thanks, > Valentyn > > On Tue, Oct 1, 2019 at 11:24 AM Chris Roat <[email protected]> wrote: > >> While evaluating many tools for a project, I found Beam suits my needs >> quite well from the abstraction point of view. Both the dead-simple way to >> scale up (and even down to single-machine for testing) and the ease of >> moving between different runners are key. Plus, I'm familiar with the >> framework from having used Flume while at Google. >> >> One thing I'd find useful in the implementation are resource hints[1], >> particularly to use GPUs for several parts of the processing. Forgoing >> hints and the ability to run easily on GPUs, I'd be happy to break up my >> pipeline, and just spin up all my machines with GPUs for the sub-pipelines >> that need it. >> >> Some paths I'm considering: >> - Find the easiest way to go from start-cluster-with-cpus (i.e. gcloud >> container clusters ... --accelerator=...) to run-dataflow-on-said-cluster. >> What would that be? >> - Implement --accelerator in PipelineOptions and implement for Dataflow >> >> Thanks for any advice, >> Chris >> >> [1] https://issues.apache.org/jira/browse/BEAM-2085 >> >
