Re: HAR file and path globbing

Cheolsoo Park Tue, 25 Sep 2012 19:46:21 -0700

Sounds like I was wrong. ;-)

You might get a better answer from hadoop user group since this is more
related to HarFileSystem than Pig I think.


Thanks,
Cheolsoo

On Tue, Sep 25, 2012 at 6:20 PM, Mohnish Kodnani
<[email protected]>wrote:

> Hi Chelsoo,
> thanks for replying. On the same system the following works :
>
> x = load 'har:///a/b/b/22.har/00/*,har:///a/b/c/d/23.har/00/*' using
> PigStorage('\t');
>
> Two separate file paths with har protocol work.
>
> A single path works but if I do the following I get an error.
> x = LOAD 'har:///a/b/c/{d.har,e.har}/z/ab/*' using PigStorage('\t');
>
> Thanks
> Mohnish
>
> On Tue, Sep 25, 2012 at 6:09 PM, Cheolsoo Park <[email protected]
> >wrote:
>
> > Hi Mohnish,
> >
> > I am not very familiar with har files, so I might be wrong here.
> >
> > Looking at the call stack, the exception is thrown from initialize(URI
> > name, Configuration conf) in HarFileSystem.java. In the source code, the
> > comment of this method says the following:
> >
> > Initialize a Har filesystem per har archive. The
> > > archive home directory is the top level directory
> > > in the filesystem that contains the HAR archive.
> >
> >
> > This sounds to me that HarFileSystem expects a single path.
> >
> >
> > This gives error due to the curly braces being encoded to %7B and %7D.
> >
> >
> > The encoded curly braces should be fine though. In fact, if they're not
> > encoded, that's a problem because then a URISyntaxException will be
> thrown
> > by Java URI class.
> >
> > Hope that this helps,
> > Cheolsoo
> >
> >
> > On Tue, Sep 25, 2012 at 12:43 PM, Mohnish Kodnani <
> > [email protected]
> > > wrote:
> >
> > > Hi,
> > > I am trying to give multiple paths to a pig script using path globbing
> in
> > > HAR file format and it does not seem to work. I wanted to know if this
> is
> > > expected or a bug / feature request.
> > >
> > > Command :
> > > x = LOAD 'har:///a/b/c/{d.har,e.har}/z/ab/*' using PigStorage('\t');
> > >
> > > This gives error due to the curly braces being encoded to %7B and %7D.
> > > I am trying this on Pig 0.8.0
> > >
> > > ERROR 2017: Internal error creating job configuration.
> > >
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
> to
> > > open iterator for alias blah
> > >         at org.apache.pig.PigServer.openIterator(PigServer.java:765)
> > >         at
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615)
> > >         at
> > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > >         at
> > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> > >         at
> > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > >         at org.apache.pig.Main.run(Main.java:455)
> > >         at org.apache.pig.Main.main(Main.java:107)
> > > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store
> alias
> > > blah
> > >         at org.apache.pig.PigServer.storeEx(PigServer.java:889)
> > >         at org.apache.pig.PigServer.store(PigServer.java:827)
> > >         at org.apache.pig.PigServer.openIterator(PigServer.java:739)
> > >         ... 7 more
> > > Caused by:
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
> > > ERROR 2017: Internal error creating job configuration.
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:679)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
> > >         at
> > >
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
> > >         at org.apache.pig.PigServer.storeEx(PigServer.java:885)
> > >         ... 9 more
> > > Caused by: java.io.IOException: Invalid path for the Har Filesystem.
> > >
> > >
> >
> har:///user/cronusapp/cassini_downsample_logs/prod/2012/09/%7B22.har,23.har%7D/00/*
> > >         at
> > > org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:100)
> > >         at
> > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1563)
> > >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:225)
> > >         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:348)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:317)
> > >         at
> > > org.apache.pig.builtin.PigStorage.setLocation(PigStorage.java:219)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
> > >         ... 14 more
> > >
> >
>

Re: HAR file and path globbing

Reply via email to