In streaming, a simple way is to add a reshuffle to increase parallelism. When you are external-call bound, extra cost of reshuffle is negligible. e.g. https://stackoverflow.com/questions/46116443/dataflow-streaming-job-not-scaleing-past-1-worker
Note that by default Dataflow workers use a couple of hundred threads as required. This can be increased with a pipeline option if you prefer. I am not sure of other runners. On Thu, Mar 15, 2018 at 8:25 AM Falcon Taylor-Carter < fal...@bounceexchange.com> wrote: > Hello Pablo, > > Thanks for checking up (I'm working with Josh on this problem). It seems > there isn't a built-in process for this kind of use case currently, and > that the best process right now is to handle our own bundling and threading > in the DoFn. If you had any other suggestions, or anything to keep in mind > in doing this, let us know! > > Falcon > > On Tue, Mar 13, 2018 at 4:52 PM, Pablo Estrada <pabl...@google.com> wrote: > >> I'd just like to close the loop. Josh, did you get an answer/guidance on >> how to proceed with your pipeline? >> Or maybe we'll need a new thread to figure that out : ) >> Best >> -P. >> >> >> On Fri, Mar 9, 2018 at 1:39 PM Josh Ferge <josh.fe...@bounceexchange.com> >> wrote: >> >>> Hello all: >>> >>> Our team has a pipeline that make external network calls. These >>> pipelines are currently super slow, and the hypothesis is that they are >>> slow because we are not threading for our network calls. The github issue >>> below provides some discussion around this: >>> >>> https://github.com/apache/beam/pull/957 >>> >>> In beam 1.0, there was IntraBundleParallelization, which helped with >>> this. However, this was removed because it didn't comply with a few BEAM >>> paradigms. >>> >>> Questions going forward: >>> >>> What is advised for jobs that make blocking network calls? It seems >>> bundling the elements into groups of size X prior to passing to the DoFn, >>> and managing the threading within the function might work. thoughts? >>> Are these types of jobs even suitable for beam? >>> Are there any plans to develop features that help with this? >>> >>> Thanks >>> >> -- >> Got feedback? go/pabloem-feedback >> <https://goto.google.com/pabloem-feedback> >> > >