Any possibility to run larger data sets with DirectRunner?

Steve973 Tue, 17 Sep 2019 02:52:21 -0700

Hi, all.  I would like to begin to set up my workflow in Apache Beam, but
only run it on a local machine until our system administrators have the
capacity to set up an adequate (spark or hadoop) cluster.  From the
documentation, I understand that we should be mindful of the memory
requirements of a data set that we use, but is there any alternative (of
course, at the sacrifice of speed) to using a larger data set with the
DirectRunner?  Can we configure it to spill to disk, possibly?


Thanks,
Steve

Any possibility to run larger data sets with DirectRunner?

Reply via email to