Raghu, I change the code to what you sugguested, but I got an exception when i try to store. java.io.IOException: File already exists:file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-a/part-r-00000 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at com.bluekai.analytics.pig.storage.BkMultiStorage$MultiStorageOutputFormat$1.createOutputStream(BkMultiStorage.java:325) at com.bluekai.analytics.pig.storage.BkMultiStorage$MultiStorageOutputFormat$1.getStore(BkMultiStorage.java:304) at com.bluekai.analytics.pig.storage.BkMultiStorage$MultiStorageOutputFormat$1.getStore(BkMultiStorage.java:298) at com.bluekai.analytics.pig.storage.BkMultiStorage$MultiStorageOutputFormat$1.write(BkMultiStorage.java:285) at com.bluekai.analytics.pig.storage.BkMultiStorage$MultiStorageOutputFormat$1.write(BkMultiStorage.java:261) at com.bluekai.analytics.pig.storage.BkMultiStorage.putNext(BkMultiStorage.java:184) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:395) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:250) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
where prefix-a is dynamically generated based on my tuple. final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-0/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-1/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-2/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-3/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-4/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-5/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-6/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-7/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-8/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-9/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-A/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-B/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-C/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-D/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-E/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-F/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-G/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-H/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-I/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-J/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-K/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-L/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-M/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-N/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-O/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-P/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-Q/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-R/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-S/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-T/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-U/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-V/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-W/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-X/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-Y/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-Z/part-r-00000 final output stores at file:/user/bbuda/ids/_temporary/_attempt_local_0001_r_000000_0/prefix-a/part-r-00000 I am wondering if it is because the path is case insensitive? Thanks, Felix On Fri, Nov 4, 2011 at 3:31 PM, Raghu Angadi <[email protected]> wrote: > You need to set output path to > '/Users/felix/Documents/pig/multi_store_output' > in your setStoreLocation(). > Alternately for clarity, you could modify your store udf to be more like: > store load_log INTO '/Users/felix/Documents/pig/multi_store_output' using > MyMultiStorage('ns_{0}/site_{1}', '2,1', '1,2'); > > The reason FileOutputFormat needs a real path is that, at run time hadoop > actually uses a temporary path then move the output to correct path if the > job succeeds. > > Raghu. > > On Thu, Nov 3, 2011 at 9:45 AM, Dmitriy Ryaboy <[email protected]> wrote: > > > Don't use FileOutputFormat? Or rather, use something that extends it and > > overrides the validation. > > > > On Wed, Nov 2, 2011 at 3:19 PM, felix gao <[email protected]> wrote: > > > > > If you don't call that funciton. Hadoop is going to throw exception for > > not > > > having output set for the job. > > > something like > > > Caused by: org.apache.hadoop.mapred.InvalidJobConfException: Output > > > directory not set. > > > at > > > > > > > > > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120) > > > at > > > > > > > > > org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:87) > > > > > > So i have to set it and then somehow delete it after pig completes. > > > > > > > > > > > > > > > On Wed, Nov 2, 2011 at 3:00 PM, Ashutosh Chauhan <[email protected] > > > >wrote: > > > > > > > Then, don't call FileOutputFormat.setOutputPath(job, new > > Path(location)); > > > > Looks like I am missing something here. > > > > > > > > Ashutosh > > > > On Wed, Nov 2, 2011 at 14:10, felix gao <[email protected]> wrote: > > > > > > > > > Ashutosh, > > > > > > > > > > I problem is I don't wan to use that location at all since I am > > > > > constructing the output location based on tuple input. The location > > is > > > > just > > > > > a dummy holder for me to substitute the right parameters > > > > > > > > > > Felix > > > > > > > > > > On Wed, Nov 2, 2011 at 10:47 AM, Ashutosh Chauhan < > > > [email protected] > > > > > >wrote: > > > > > > > > > > > Hey Felix, > > > > > > > > > > > > >> The only problem is that in the setStoreLocation function we > > have > > > to > > > > > > call > > > > > > >> FileOutputFormat.setOutputPath(job, new Path(location)); > > > > > > > > > > > > Cant you massage location to appropriate string you want to? > > > > > > > > > > > > Ashutosh > > > > > > > > > > > > On Tue, Nov 1, 2011 at 18:07, felix gao <[email protected]> > wrote: > > > > > > > > > > > > > I have wrote a custom store function that primarily based on > the > > > > > > > multi-storage store function. They way I use it is > > > > > > > > > > > > > > > > > > > > > store load_log INTO > > > > > > > '/Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' > > > using > > > > > > > MyMultiStorage('2,1', '1,2'); > > > > > > > where {0} and {1} will be substituted with the tuple index at 0 > > and > > > > > index > > > > > > > at 1. Everything is fine and all the data is written to the > > > correct > > > > > > place. > > > > > > > The only problem is that in the setStoreLocation function we > > have > > > to > > > > > > call > > > > > > > FileOutputFormat.setOutputPath(job, new Path(location)); i have > > > > > > > 'Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' > as > > > my > > > > > > output > > > > > > > location so there is actually a folder created in my fs with > > ns_{0} > > > > > > > and site_{1}. Is there a way to tell hadoop not to create > those > > > > output > > > > > > > directory? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Felix > > > > > > > > > > > > > > > > > > > > > > > > > > > >
