Hello, I am new to the Apache ecosystem and am attempting to use Beam to build a horizontally scalable pipeline for feature extraction from video data. The extraction process for certain features can be accelerated using GPUs, while other features require only a CPU to compute. I have several questions, listed in order of decreasing priority:
1. Can I run a Beam pipeline with GPUs? (as far as I can tell, Google Cloud Dataflow does not currently support this option) 2. Is it possible to achieve this functionality using Spark or Flink as a runner? 3. Is it possible to mix hardware types in a Beam pipeline (e.g., to have certain features extracted by CPUs and others extracted by GPUs), or does this go against the Beam paradigm of abstracting away such details? 4. Do the Spark and Flink runners have support for auto-scaling like Google Cloud Dataflow? 5. What are relevant considerations when selecting between Spark vs. Flink as a runner? Any guidance, resources, or tips are appreciated. Thank you in advance! -Xander
