Thanks for the input. Storing the maps in a static variable increase a lot the performance. Of course if these "sideInputs" grow too large I might need to translate these into the CoGroupByKey option.
Thanks again, Augusto > On 8 Apr 2019, at 20:07, Lukasz Cwik <[email protected]> wrote: > > Side input performance and scaling is runner dependent. Runners should > attempt to provide support for efficient random access lookup in the maps. > Side inputs should also be cached across elements if the map hasn't changed > which runners should also be capable of doing. > > So yes, side input size can impact performance depending on which runner you > choose to use. Some runners don't deal with side inputs at all while others > may scale to support terabytes in size. > > Saving it as a static class variable may be a useful workaround if the runner > is not performing as well as you would like. > > Map side inputs are usually used to produce joins. Have you tried using > CoGroupByKey to do the join instead? > > On Mon, Apr 8, 2019 at 10:30 AM [email protected] > <mailto:[email protected]> <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > In one of my transforms I am using Map which is the result of a previous > transform as a sideInput. This Map<String, Int> is potentially very large > with count of all words that appeared in all documents. > > The step that uses the sideInput is quite slow because it seems like it is > initialising a huge Hashmap for every element it processes (I followed this > example https://beam.apache.org/documentation/programming-guide/#side-inputs > <https://beam.apache.org/documentation/programming-guide/#side-inputs>) > > Is this the wrong way of using sideInputs? And by this I mean, can a > sideInput be too big to be a sideInput? I also thought about saving the > sideInput as a static class variable, then in principle I only have to read > it once per "transform" initialised in the cluster. > > Am I going totally wrong about this, should I try other approaches? > > Best regards, > Augusto > >
