No, just moving to Slack from Cloudera, my data team is all of two people* right now, and a dedicated Hadoop ops person doesn't make sense yet.
* But of course, I'm hiring. :) On Mon, Nov 23, 2015 at 6:43 PM Everett Anderson <[email protected]> wrote: > Josh, not to steal the thread, but I'm quite curious -- did something > drive you to using S3 instead of HDFS? > > For me, I've been surprised how brittle HDFS seems out of the box in the > face of even mild load. :( We've spent a lot of time turning knobs to make > our data nodes stay responsive. > > > On Mon, Nov 23, 2015 at 5:45 PM, Josh Wills <[email protected]> wrote: > >> (I don't know the answer to this, but as I also now run Crunch on top of >> S3, I'm interested in a solution.) >> >> On Mon, Nov 23, 2015 at 5:22 PM, Jeff Quinn <[email protected]> wrote: >> >>> Hey All, >>> >>> We have run in to a pretty frustrating inefficiency inside of >>> the CrunchJobHooks.CompletionHook#handleMultiPaths. >>> >>> This method loops over all of the partial output files and moves them to >>> their ultimate destination directories, >>> calling org.apache.hadoop.fs.FileSystem#rename(org.apache.hadoop.fs.Path, >>> org.apache.hadoop.fs.Path) on each partial output in a loop. >>> >>> This is no problem when the org.apache.hadoop.fs.FileSystem in question >>> is HDFS where #rename is a cheap operation, but when an implementation such >>> as S3NativeFileSystem is used it is extremely inefficient, as each >>> iteration through the loop makes a single blocking S3 API call, and this >>> loop can be extremely long when there are many thousands of partial output >>> files. >>> >>> Has anyone dealt with this before / have any ideas to work around? >>> >>> Thanks! >>> >>> Jeff >>> >>> >>> >>> *DISCLAIMER:* The contents of this email, including any attachments, >>> may contain information that is confidential, proprietary in nature, >>> protected health information (PHI), or otherwise protected by law from >>> disclosure, and is solely for the use of the intended recipient(s). If you >>> are not the intended recipient, you are hereby notified that any use, >>> disclosure or copying of this email, including any attachments, is >>> unauthorized and strictly prohibited. If you have received this email in >>> error, please notify the sender of this email. Please delete this and all >>> copies of this email from your system. Any opinions either expressed or >>> implied in this email and all attachments, are those of its author only, >>> and do not necessarily reflect those of Nuna Health, Inc. >> >> >> > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protected > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not the > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of this > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc.
