One avenue is to adjust --worker_machine_type when you start a pipeline, and pass a custom machine type[1], with a small number of cores and a lot of RAM.
[1] https://cloud.google.com/custom-machine-types On Sat, Apr 11, 2020 at 3:20 AM Tadas Šubonis <[email protected]> wrote: > Thanks. > > The program itself is heavy on memory usage but its CPU and IO usage are > low-medium (let's say max 2 cores per task). Before, I used to use 16GB RAM > workers (so they would fit 2-6 tasks at once) but as long as there is a way > to limit worker-level parallelism, I think I should be fine. Could you > point me to the right place in the docs to read about that ( I am planning > to use Dataflow)? > > On Sat, Apr 11, 2020 at 1:48 AM Robert Bradshaw <[email protected]> > wrote: > >> In general, runners like to schedule more than one task per worker (to >> take advantage of multiple cores, etc.). The mitigation to this is likely >> to be runner-specific. E.g. For Dataflow the number of tasks/threads per >> machine is by default chosen to be the number of cores of that VM. I think >> Flink and Spark have flags that can be set to control this as well. >> >> Another option would be to control the resource usage with a global lock. >> Your DoFn would acquire this lock before starting up the program, and other >> workers would sit idly by for their turn. >> >> I think trying to run on machines with lots of memory is the easiest >> solution, unless this is truly infeasible (depends on what your setup is). >> >> >> On Fri, Apr 10, 2020 at 4:24 PM Valentyn Tymofieiev <[email protected]> >> wrote: >> >>> I don't think there is a silver bullet solution to avoid an OOM but >>> there are mitigations you can employ if there is a problem, such as: >>> - sizing the workers appropriately, >>> - avoiding memory leaks in the user code, >>> - limiting worker-level parallelism, if necessary. >>> >> > > -- > > Kind Regards, > Tadas Šubonis >
