Try set mapreduce.input.fileinputformat.split.minsize & mapreduce.input.fileinputformat.split.maxsize to a lower number from the default (usually 64 MB). If you know of a specific DoFn in which this is required, better put it there in its configure function.
On Tue, Nov 24, 2015 at 3:28 PM Robinson, Landon - Landon < [email protected]> wrote: > Hi all, > > I have a Crunch job that tries to combine the last four tasks of my > program into one M/R job. > That’s normally not a problem, but my data *starts small and grows > exponentially* in the most major of those DoFn tasks, resulting in spills > to disk (local, not HDFS). > > I’ve already: > > - Implemented scaleFactor on the DoFn where the data will emit back > more records than it consumed, which is 40.0f > - Set io.sort.mb parameter to cluster setting, which is 1792 > - Implemented map-side compression with snappy > > Data set I’m ingesting is from a previous map-reduce job, which comes out > to 19 files of 10mb size (which in Crunch comes to 2 splits). > Help? > --------------------------------------------------------------------------- > Landon Robinson > Big Data/Hadoop Engineer > --------------------------------------------------------------------------- > NOTICE: All information in and attached to the e-mails below may be > proprietary, confidential, privileged and otherwise protected from improper > or erroneous disclosure. If you are not the sender's intended recipient, > you are not authorized to intercept, read, print, retain, copy, forward, or > disseminate this message. If you have erroneously received this > communication, please notify the sender immediately by phone (704-758-1000) > or by e-mail and destroy all copies of this message electronic, paper, or > otherwise. > > *By transmitting documents via this email: Users, Customers, Suppliers and > Vendors collectively acknowledge and agree the transmittal of information > via email is voluntary, is offered as a convenience, and is not a secured > method of communication; Not to transmit any payment information E.G. > credit card, debit card, checking account, wire transfer information, > passwords, or sensitive and personal information E.G. Driver's license, > DOB, social security, or any other information the user wishes to remain > confidential; To transmit only non-confidential information such as plans, > pictures and drawings and to assume all risk and liability for and > indemnify Lowe's from any claims, losses or damages that may arise from the > transmittal of documents or including non-confidential information in the > body of an email transmittal. Thank you. * >
