Thanks for the quick replies everyone! Setting the configuration at the pipeline level (as opposed to the DoFn level) worked.
On Tue, Oct 13, 2015 at 1:08 PM, Micah Whitacre <[email protected]> wrote: > Yeah was misconstruing it with the setContext(...) method which provides > the configuration when the job is actually running.[1] Luke, you might > look at generating a plan of your pipeline to see what other DoFns might be > inside the same job and causing a conflict with your settings. > > We typically do the global settings vs trying to tweak at each DoFn simply > because it allows us to avoid worrying about which DoFn's get grouped into > a single task and override each other. > > [1] - > http://crunch.apache.org/apidocs/0.12.0/org/apache/crunch/DoFn.html#configure(org.apache.hadoop.conf.Configuration) > > On Tue, Oct 13, 2015 at 3:02 PM, Robinson, Landon - Landon < > [email protected]> wrote: > >> You can do it both ways: at the DoFn level or at the pipeline level. >> >> For global settings, go with the pipeline level. For individual >> jobs/tasks, go DoFn Level. >> >> *Pipeline Level:* >> >> Configuration crunchConf = getConf(); >> crunchConf.set("mapred.job.queue.name", "batch"); >> Pipeline pipeline = new MRPipeline(TransformKronosMR.class, *“*My Pipeline" >> ,crunchConf); >> >> >> *DoFn Level (as mentioned):* >> >> @Override >> public void configure(Configuration conf) { >> conf.set("mapreduce.map.java.opts", "-Xmx3900m"); >> conf.set("mapreduce.reduce.java.opts", "-Xmx3900m"); >> >> conf.set("mapreduce.map.memory.mb", "4096"); >> conf.set("mapreduce.reduce.memory.mb", "4096"); >> } >> >> >> >> >> --------------------------------------------------------------------------- >> Landon Robinson >> Big Data/Hadoop Engineer >> Lowe’s Companies Inc. | IT Business Intelligence >> >> --------------------------------------------------------------------------- >> >> From: Micah Whitacre <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Tuesday, October 13, 2015 at 3:55 PM >> To: "[email protected]" <[email protected]> >> Subject: Re: Hadoop Configuration from DoFn >> >> Luke, >> Generally that configuration should be set on the Configuration object >> passed to Pipeline vs on the individual DoFns. The configure(...) method >> is called when re-instantiating the DoFn on the Map/Reduce task and at that >> point those memory settings wouldn't be honored. >> >> On Tue, Oct 13, 2015 at 2:52 PM, Luke Hansen <[email protected]> >> wrote: >> >>> Does anyone know if this is the right way to configure Hadoop from a >>> Crunch DoFn? This didn't seem to affect anything. >>> >>> Thanks! >>> >>> @Override >>> public void configure(Configuration conf) { >>> conf.set("mapreduce.map.java.opts", "-Xmx3900m"); >>> conf.set("mapreduce.reduce.java.opts", "-Xmx3900m"); >>> >>> conf.set("mapreduce.map.memory.mb", "4096"); >>> conf.set("mapreduce.reduce.memory.mb", "4096"); >>> } >>> >>> >> NOTICE: All information in and attached to the e-mails below may be >> proprietary, confidential, privileged and otherwise protected from improper >> or erroneous disclosure. If you are not the sender's intended recipient, >> you are not authorized to intercept, read, print, retain, copy, forward, or >> disseminate this message. If you have erroneously received this >> communication, please notify the sender immediately by phone >> (704-758-1000) or by e-mail and destroy all copies of this message >> electronic, paper, or otherwise. >> >> *By transmitting documents via this email: Users, Customers, Suppliers >> and Vendors collectively acknowledge and agree the transmittal of >> information via email is voluntary, is offered as a convenience, and is not >> a secured method of communication; Not to transmit any payment information >> E.G. credit card, debit card, checking account, wire transfer information, >> passwords, or sensitive and personal information E.G. Driver's license, >> DOB, social security, or any other information the user wishes to remain >> confidential; To transmit only non-confidential information such as plans, >> pictures and drawings and to assume all risk and liability for and >> indemnify Lowe's from any claims, losses or damages that may arise from the >> transmittal of documents or including non-confidential information in the >> body of an email transmittal. Thank you. * >> > >
