Hi, all. I would like to begin to set up my workflow in Apache Beam, but only run it on a local machine until our system administrators have the capacity to set up an adequate (spark or hadoop) cluster. From the documentation, I understand that we should be mindful of the memory requirements of a data set that we use, but is there any alternative (of course, at the sacrifice of speed) to using a larger data set with the DirectRunner? Can we configure it to spill to disk, possibly?
Thanks, Steve
