Running Beam Pipelines with GPUs (and other questions)

Xander Song Thu, 30 Jan 2020 14:53:18 -0800

Hello,

I am new to the Apache ecosystem and am attempting to use Beam to build a
horizontally scalable pipeline for feature extraction from video data. The
extraction process for certain features can be accelerated using GPUs,
while other features require only a CPU to compute. I have several
questions, listed in order of decreasing priority:


   1. Can I run a Beam pipeline with GPUs? (as far as I can tell, Google
   Cloud Dataflow does not currently support this option)
   2. Is it possible to achieve this functionality using Spark or Flink as
   a runner?
   3. Is it possible to mix hardware types in a Beam pipeline (e.g., to
   have certain features extracted by CPUs and others extracted by GPUs), or
   does this go against the Beam paradigm of abstracting away such details?
   4. Do the Spark and Flink runners have support for auto-scaling like
   Google Cloud Dataflow?
   5. What are relevant considerations when selecting between Spark vs.
   Flink as a runner?

Any guidance, resources, or tips are appreciated. Thank you in advance!
-Xander

Running Beam Pipelines with GPUs (and other questions)

Reply via email to