Sounds like I was wrong. ;-) You might get a better answer from hadoop user group since this is more related to HarFileSystem than Pig I think.
Thanks, Cheolsoo On Tue, Sep 25, 2012 at 6:20 PM, Mohnish Kodnani <[email protected]>wrote: > Hi Chelsoo, > thanks for replying. On the same system the following works : > > x = load 'har:///a/b/b/22.har/00/*,har:///a/b/c/d/23.har/00/*' using > PigStorage('\t'); > > Two separate file paths with har protocol work. > > A single path works but if I do the following I get an error. > x = LOAD 'har:///a/b/c/{d.har,e.har}/z/ab/*' using PigStorage('\t'); > > Thanks > Mohnish > > On Tue, Sep 25, 2012 at 6:09 PM, Cheolsoo Park <[email protected] > >wrote: > > > Hi Mohnish, > > > > I am not very familiar with har files, so I might be wrong here. > > > > Looking at the call stack, the exception is thrown from initialize(URI > > name, Configuration conf) in HarFileSystem.java. In the source code, the > > comment of this method says the following: > > > > Initialize a Har filesystem per har archive. The > > > archive home directory is the top level directory > > > in the filesystem that contains the HAR archive. > > > > > > This sounds to me that HarFileSystem expects a single path. > > > > > > This gives error due to the curly braces being encoded to %7B and %7D. > > > > > > The encoded curly braces should be fine though. In fact, if they're not > > encoded, that's a problem because then a URISyntaxException will be > thrown > > by Java URI class. > > > > Hope that this helps, > > Cheolsoo > > > > > > On Tue, Sep 25, 2012 at 12:43 PM, Mohnish Kodnani < > > [email protected] > > > wrote: > > > > > Hi, > > > I am trying to give multiple paths to a pig script using path globbing > in > > > HAR file format and it does not seem to work. I wanted to know if this > is > > > expected or a bug / feature request. > > > > > > Command : > > > x = LOAD 'har:///a/b/c/{d.har,e.har}/z/ab/*' using PigStorage('\t'); > > > > > > This gives error due to the curly braces being encoded to %7B and %7D. > > > I am trying this on Pig 0.8.0 > > > > > > ERROR 2017: Internal error creating job configuration. > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable > to > > > open iterator for alias blah > > > at org.apache.pig.PigServer.openIterator(PigServer.java:765) > > > at > > > > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615) > > > at > > > > > > > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) > > > at > > > > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) > > > at > > > > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) > > > at org.apache.pig.Main.run(Main.java:455) > > > at org.apache.pig.Main.main(Main.java:107) > > > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store > alias > > > blah > > > at org.apache.pig.PigServer.storeEx(PigServer.java:889) > > > at org.apache.pig.PigServer.store(PigServer.java:827) > > > at org.apache.pig.PigServer.openIterator(PigServer.java:739) > > > ... 7 more > > > Caused by: > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: > > > ERROR 2017: Internal error creating job configuration. > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:679) > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256) > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147) > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382) > > > at > > > > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209) > > > at org.apache.pig.PigServer.storeEx(PigServer.java:885) > > > ... 9 more > > > Caused by: java.io.IOException: Invalid path for the Har Filesystem. > > > > > > > > > har:///user/cronusapp/cassini_downsample_logs/prod/2012/09/%7B22.har,23.har%7D/00/* > > > at > > > org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:100) > > > at > > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1563) > > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:225) > > > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183) > > > at > > > > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:348) > > > at > > > > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:317) > > > at > > > org.apache.pig.builtin.PigStorage.setLocation(PigStorage.java:219) > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369) > > > ... 14 more > > > > > >
