I see; they all need to end up in the same bucket in S3 w/different names. Then yes, the options you describe sound about right.
On Fri, Nov 13, 2015 at 9:49 AM, David Ortiz <[email protected]> wrote: > Hey, > > > > The reason I was looking for this is because whether I write them to > different directories, or the same directories, I have to distcp them all > to the same s3 bucket for downstream processing to function properly, so I > need to make sure that the file names don’t overlap. So to get this to > work, it sounds like my options would be the following: > > · Have the client move the files to a common directory with names > I want using FileSystem calls > > · Write a shell script that Oozie calls to do the same thing as > the previous option, but with dfs calls. > > · Write an additional crunch job, which will load the output from > the previous four jobs and union the results. > > > > Does that sounds about right? > > > > Thanks, > > Dave > > > > *From:* Josh Wills [mailto:[email protected]] > *Sent:* Friday, November 13, 2015 12:41 PM > *To:* [email protected] > *Subject:* Re: Output file prefix > > > > Hey David, > > > > There isn't a way to muck w/the file output prefix on a per-collection > basis. Would something like a PathPerKeyTarget work for this situation, > where you would have four keys for the different output directories and > could sort of union together the PTable<String, Whatever> instances that > you needed to create on a particular run? > > > > J > > > > On Fri, Nov 13, 2015 at 7:36 AM, David Ortiz <[email protected]> wrote: > > Hey everyone, > > > > I thought I remembered seeing something in the docs about being able > to set a prefix for output files from a collection, but I am having trouble > finding it now. Does that exist? > > > > I am trying to break up a large job that had four parallel threads of > execution on different data sets, that all fed one output set into four > separate jobs to make it easier to rerun only one of the input sets in the > event something goes wrong, and this would make it a lot easier to deal > with getting the output all into one directory. > > > > Thanks, > > Dave > > > *This email is intended only for the use of the individual(s) to whom it > is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* >
