MRPipeline.cache()

David Ortiz Fri, 13 Nov 2015 07:36:22 -0800

Hey,

     If I have a super expensive to read input data set (think hundreds of
GB of data on s3 for example), would I be able to use cache to make sure I
only do the read once, then hand it out to the jobs that need it, as
opposed to what crunch does by default, which is read it once for each
parallel thread that needs the data?


Thanks,
     Dave

MRPipeline.cache()

Reply via email to