Through a combination of a few conf parameters, I was able to fix the spills
issue.
* Map output compression w/snappy
* Setting task.io.sort.mb to system setting
Properties File:
mapred.compress.map.output=true
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
mapreduce.task.io.sort.mb=1792
Crunch Code:
crunchConf.set("mapred.compress.map.output", mapCompress);
crunchConf.set("mapred.map.output.compression.codec", mapCompressionCodec);
crunchConf.set("mapreduce.task.io.sort.mb", mapTaskSortMB);
Pipeline pipeline = new MRPipeline(TransformMR.class, "Crunch Pipeline",
crunchConf);
Thanks everyone for the input. We have a beefy cluster, but Crunch didn’t know
some of our settings like io.sort.mb (which was set to 100mb, but our number is
1792).
Thanks again, just thought I’d share the learning.
---------------------------------------------------------------------------
Landon Robinson
---------------------------------------------------------------------------
From: Micah Whitacre <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, November 10, 2015 at 3:27 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Handling Spills in Crunch
In my quick search I didn't find any shortcuts but Crunch should honor any of
the normal Hadoop config. If you find it doesn't then feel free to log an
issue.
I believe the general rule is that if you set the io.sort.mb to 25% of your Map
or Reduce JVM that should help cut down on data written to local disk as well.
On Tue, Nov 10, 2015 at 12:07 PM, Robinson, Landon - Landon
<[email protected]<mailto:[email protected]>> wrote:
The specific error I’m getting is related to this:
https://support.pivotal.io/hc/en-us/articles/205647417-Map-Reduce-job-failed-with-Could-not-find-any-valid-local-directory-for-output-attempt-xxxx-xxxx-m-x-file-out
Does crunch offer a compression shortcut in-code, or am I better off to use the
compression from mapper output using the map reduce.map.output.compress = true
param?
Thanks again.
- Landon
---------------------------------------------------------------------------
Landon Robinson
---------------------------------------------------------------------------
From: Micah Whitacre <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, November 10, 2015 at 10:19 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Handling Spills in Crunch
Landon,
I don't believe there is anything specific in Crunch that will help you but you
can definitely tweak some normal Hadoop configuration settings to try and help
with spilling. Specifically tweaking settings like spill percentage and the
io.sort.mb will help reduce the spilling.
http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens
http://www.slideshare.net/cloudera/mr-perf
On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon
<[email protected]<mailto:[email protected]>> wrote:
Could use some guidance in dealing with spills. I have a data set that, in a
DoFn, grows exponentially. As in, my dataset starts small, but I emit back
maybe 40% more data than I take in.
I’ve tried using scaleFactor() to compensate for this, but I seem to get this
error at runtime using a MRPipeline:
org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill failed
Do I need to increase java memory opts perhaps?
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be
proprietary, confidential, privileged and otherwise protected from improper or
erroneous disclosure. If you are not the sender's intended recipient, you are
not authorized to intercept, read, print, retain, copy, forward, or disseminate
this message. If you have erroneously received this communication, please
notify the sender immediately by phone (704-758-1000<tel:%28704-758-1000>) or
by e-mail and destroy all copies of this message electronic, paper, or
otherwise.
By transmitting documents via this email: Users, Customers, Suppliers and
Vendors collectively acknowledge and agree the transmittal of information via
email is voluntary, is offered as a convenience, and is not a secured method of
communication; Not to transmit any payment information E.G. credit card, debit
card, checking account, wire transfer information, passwords, or sensitive and
personal information E.G. Driver's license, DOB, social security, or any other
information the user wishes to remain confidential; To transmit only
non-confidential information such as plans, pictures and drawings and to assume
all risk and liability for and indemnify Lowe's from any claims, losses or
damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be
proprietary, confidential, privileged and otherwise protected from improper or
erroneous disclosure. If you are not the sender's intended recipient, you are
not authorized to intercept, read, print, retain, copy, forward, or disseminate
this message. If you have erroneously received this communication, please
notify the sender immediately by phone (704-758-1000<tel:%28704-758-1000>) or
by e-mail and destroy all copies of this message electronic, paper, or
otherwise.
By transmitting documents via this email: Users, Customers, Suppliers and
Vendors collectively acknowledge and agree the transmittal of information via
email is voluntary, is offered as a convenience, and is not a secured method of
communication; Not to transmit any payment information E.G. credit card, debit
card, checking account, wire transfer information, passwords, or sensitive and
personal information E.G. Driver's license, DOB, social security, or any other
information the user wishes to remain confidential; To transmit only
non-confidential information such as plans, pictures and drawings and to assume
all risk and liability for and indemnify Lowe's from any claims, losses or
damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be
proprietary, confidential, privileged and otherwise protected from improper or
erroneous disclosure. If you are not the sender's intended recipient, you are
not authorized to intercept, read, print, retain, copy, forward, or disseminate
this message. If you have erroneously received this communication, please
notify the sender immediately by phone (704-758-1000) or by e-mail and destroy
all copies of this message electronic, paper, or otherwise.
By transmitting documents via this email: Users, Customers, Suppliers and
Vendors collectively acknowledge and agree the transmittal of information via
email is voluntary, is offered as a convenience, and is not a secured method of
communication; Not to transmit any payment information E.G. credit card, debit
card, checking account, wire transfer information, passwords, or sensitive and
personal information E.G. Driver's license, DOB, social security, or any other
information the user wishes to remain confidential; To transmit only
non-confidential information such as plans, pictures and drawings and to assume
all risk and liability for and indemnify Lowe's from any claims, losses or
damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.