While evaluating many tools for a project, I found Beam suits my needs
quite well from the abstraction point of view.  Both the dead-simple way to
scale up (and even down to single-machine for testing) and the ease of
moving between different runners are key.  Plus, I'm familiar with the
framework from having used Flume while at Google.

One thing I'd find useful in the implementation are resource hints[1],
particularly to use GPUs for several parts of the processing.  Forgoing
hints and the ability to run easily on GPUs, I'd be happy to break up my
pipeline, and just spin up all my machines with GPUs for the sub-pipelines
that need it.

Some paths I'm considering:
- Find the easiest way to go from start-cluster-with-cpus (i.e. gcloud
container clusters ... --accelerator=...) to run-dataflow-on-said-cluster.
What would that be?
- Implement --accelerator in PipelineOptions and implement for Dataflow

Thanks for any advice,
Chris

[1] https://issues.apache.org/jira/browse/BEAM-2085

Reply via email to