You need to set output path to '/Users/felix/Documents/pig/multi_store_output'
in your setStoreLocation().
Alternately for clarity, you could modify your store udf to be more like:
store load_log INTO '/Users/felix/Documents/pig/multi_store_output' using
MyMultiStorage('ns_{0}/site_{1}', '2,1', '1,2');The reason FileOutputFormat needs a real path is that, at run time hadoop actually uses a temporary path then move the output to correct path if the job succeeds. Raghu. On Thu, Nov 3, 2011 at 9:45 AM, Dmitriy Ryaboy <[email protected]> wrote: > Don't use FileOutputFormat? Or rather, use something that extends it and > overrides the validation. > > On Wed, Nov 2, 2011 at 3:19 PM, felix gao <[email protected]> wrote: > > > If you don't call that funciton. Hadoop is going to throw exception for > not > > having output set for the job. > > something like > > Caused by: org.apache.hadoop.mapred.InvalidJobConfException: Output > > directory not set. > > at > > > > > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120) > > at > > > > > org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:87) > > > > So i have to set it and then somehow delete it after pig completes. > > > > > > > > > > On Wed, Nov 2, 2011 at 3:00 PM, Ashutosh Chauhan <[email protected] > > >wrote: > > > > > Then, don't call FileOutputFormat.setOutputPath(job, new > Path(location)); > > > Looks like I am missing something here. > > > > > > Ashutosh > > > On Wed, Nov 2, 2011 at 14:10, felix gao <[email protected]> wrote: > > > > > > > Ashutosh, > > > > > > > > I problem is I don't wan to use that location at all since I am > > > > constructing the output location based on tuple input. The location > is > > > just > > > > a dummy holder for me to substitute the right parameters > > > > > > > > Felix > > > > > > > > On Wed, Nov 2, 2011 at 10:47 AM, Ashutosh Chauhan < > > [email protected] > > > > >wrote: > > > > > > > > > Hey Felix, > > > > > > > > > > >> The only problem is that in the setStoreLocation function we > have > > to > > > > > call > > > > > >> FileOutputFormat.setOutputPath(job, new Path(location)); > > > > > > > > > > Cant you massage location to appropriate string you want to? > > > > > > > > > > Ashutosh > > > > > > > > > > On Tue, Nov 1, 2011 at 18:07, felix gao <[email protected]> wrote: > > > > > > > > > > > I have wrote a custom store function that primarily based on the > > > > > > multi-storage store function. They way I use it is > > > > > > > > > > > > > > > > > > store load_log INTO > > > > > > '/Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' > > using > > > > > > MyMultiStorage('2,1', '1,2'); > > > > > > where {0} and {1} will be substituted with the tuple index at 0 > and > > > > index > > > > > > at 1. Everything is fine and all the data is written to the > > correct > > > > > place. > > > > > > The only problem is that in the setStoreLocation function we > have > > to > > > > > call > > > > > > FileOutputFormat.setOutputPath(job, new Path(location)); i have > > > > > > 'Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' as > > my > > > > > output > > > > > > location so there is actually a folder created in my fs with > ns_{0} > > > > > > and site_{1}. Is there a way to tell hadoop not to create those > > > output > > > > > > directory? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Felix > > > > > > > > > > > > > > > > > > > > >
