> The short term solutions have already been discussed: decrease the > number of reducers (and mappers, if you need them to be tied) or > potentially turn off compression if Snappy is holding too much buffer > space.
Just to follow up with this (sorry for the delay; I was busy/out for Thanksgiving), but after chatting about it, we've moved from m1.larges to m1.xlarges, and nudged our partition size up (and so partition number down) and things are going quite well now. So, if we had a 40-machine m1.large job, we're now running it with 20 m1.xlarges, and as expected it takes basically the same time/cost (since m1.xlarges have basically twice the resources), which is great. I'm still tempted to play around with turning off compression, but this is no longer a squeaky wheel, so we probably won't actively investigate it. Thanks for all the help, Aaron/Patrick/the list, we really appreciate it. - Stephen
