Thanks for your help. I'm using Elastic Map Reduce, so Pig 0.6, and running:
STORE FILES INTO '/mnt/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/mnt/output','0', 'gz',
'\\t');
And getting an error (stack trace below) that it can't create a directory. I
see that it's creating a file called /mnt/output, but not a directory. Is
this perhaps a bug in the version of Pig running on Elastic Map Reduce?
Pig Stack Trace
---------------
ERROR 2135: Received error from store function.Mkdirs failed to create
/mnt/output/tmcustomer-2011-10-07-GET-200
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to
store alias 699
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1004)
at org.apache.pig.PigServer.registerQuery(PigServer.java:386)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:739)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:374)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2135:
Received error from store function.Mkdirs failed to create
/mnt/output/tmcustomer-2011-10-07-GET-200
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:140)
at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:149)
at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:110)
at
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:781)
at org.apache.pig.PigServer.execute(PigServer.java:774)
at org.apache.pig.PigServer.access$100(PigServer.java:90)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:952)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:999)
... 12 more
Caused by: java.io.IOException: Mkdirs failed to create
/mnt/output/tmcustomer-2011-10-07-GET-200
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:367)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
at
org.apache.pig.piggybank.storage.MultiStorage.createOutputStream(MultiStorage.java:205)
at
org.apache.pig.piggybank.storage.MultiStorage.getStore(MultiStorage.java:225)
at
org.apache.pig.piggybank.storage.MultiStorage.putNext(MultiStorage.java:246)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:127)
... 20 more
================================================================================
On Mon, Oct 10, 2011 at 8:36 PM, Norbert Burger <[email protected]>wrote:
> In case it's not obvious, you'd also need a FLATTEN(group) in there before
> the FOREACH to break the tuple apart so that the fields could by
> synthesized
> into a filename.
>
> Norbert
>
> On Mon, Oct 10, 2011 at 12:57 PM, Jacob Perkins
> <[email protected]>wrote:
>
> > You'll have to run a FOREACH...GENERATE over the data first and generate
> > a single key to look like the filename you want. Then you can use
> > MultiStorage() from the piggybank. See:
> >
> > org.apache.pig.piggybank.storage.MultiStorage
> >
> > in the pig api docs.
> >
> > --jacob
> > @thedatachef
> >
> > On Mon, 2011-10-10 at 18:43 +0200, Dustin Whitney wrote:
> > > Hello all,
> > >
> > > I'm new to Hadoop and Pig, and I've got a question. I've got relation
> > that
> > > looks like this via GROUP
> > >
> > > ((customer1,2011-10-07,GET,200),{....})
> > > ((customer1,2011-10-07,PUT,201),{....})
> > > ((customer1,2011-10-07,PUT,202),{....})
> > > ((customer2,2011-10-07,GET,200),{....})
> > > ((customer2,2011-10-07,PUT,201),{....})
> > > ((customer2,2011-10-07,PUT,202),{....})
> > >
> > >
> > > I'd like each group (i.e. the data in the {...}) stored separately, and
> > I'd
> > > like to use the values in the first tuple to name my file, so the first
> > file
> > > would be customer1-2011-10-07-GET-200, and the second would be
> > > customer1-2011-10-07-PUT-201, etc. Is this possible? I can only see
> how
> > to
> > > save a single full relation to file, and I can't find any documentation
> > that
> > > states how I might use variables to name things.
> > >
> > > Thanks,
> > > Dustin
> >
> >
> >
>