That looks like a bug in org.apache.pig.piggybank.storage.avro.AvroStorage() It is not using Hadoop's wildcard/glob matching. You'll want to file a bug against that (it is part of piggybank, not Avro).
If the schema is being dynamically read from the files, it needs to use Hadoop's globbing to resolve one of the files in order to read it and inspect the schema, and not take the passed in string literally. At this point, you'll need help from those that contributed that to piggybank. The pig JIRA related to it is: https://issues.apache.org/jira/browse/PIG-1748 But that may not be the best place for this usability question. On 1/24/11 6:14 PM, "felix gao" <[email protected]<mailto:[email protected]>> wrote: Thanks for the info. I have not compiled a new version of pig and it works when I load a single avro file. But it failed when I do wildcard filename matching. log_load = LOAD '/user/felix/avro/access_log.test.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); <--- works fine but log_load = LOAD '/user/felix/avro/*.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); ERROR 1018: Problem determining schema during load org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Problem determining schema during load at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1342) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1286) at org.apache.pig.PigServer.registerQuery(PigServer.java:460) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:738) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:414) Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem determining schema during load at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:752) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1336) ... 8 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: Problem determining schema during load at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:156) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:750) ... 10 more Caused by: java.io.FileNotFoundException: File does not exist: /user/felix/avro/*.avro at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:185) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:431) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:181) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:133) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:108) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:233) at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:169) at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:150) ... 11 more How do I load multiple avro files with the load function. Felix On Mon, Jan 24, 2011 at 4:25 PM, Scott Carey <[email protected]<mailto:[email protected]>> wrote: A jar prior to the jackson jar contains an earlier version of jackson inside of it. Pig's jar typically contains all its dependencies (there is 'pig-withouthadoop.jar' instead). So my guess is that one of these (look at a listing of jar contents) has jackson in it: /usr/lib/pig/bin/../pig-0.7.0+16-core.jar /usr/lib/pig/bin/../pig-0.7.0+16.jar /usr/lib/pig/bin/../build/pig-*-core.jar: /usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar On 1/24/11 4:06 PM, "felix gao" <[email protected]<mailto:[email protected]>> wrote: here is the actual process that is running the pig script, I hope this helps. root 20820 19838 66 18:57 pts/0 00:00:00 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dpig.root.logger=INFO,console,DRFA -classpath /usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/* org.apache.pig.Main avro.pig pig -secretDebugCmd dry run: /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/.. -Dpig.root.logger=INFO,console,DRFA -classpath /usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/* org.apache.pig.Main seems the jackson 1.5.5 is on the classpath of the pig as well as tasktracker and the actual job. Felix On Mon, Jan 24, 2011 at 3:44 PM, Tatu Saloranta <[email protected]<mailto:[email protected]>> wrote: On Mon, Jan 24, 2011 at 3:13 PM, Scott Carey <[email protected]<mailto:[email protected]>> wrote: > That is confusing. Can you capture the classpath of an actual task process, > not just the TT? They shouldn't differ much, but it is worth checking. > Jackson 1.3 (or was it 1.2?) and above have all been backwards compatible > with each other I believe. And the error you are getting is definitely > caused by accessing the enable() methods that were added after 1.0.1. > I can change the Avro dependency on Jackson to 1.5.5, 1.7.1, or 1.3, and > unit tests pass. If I change it to 1.2, 1.1, or 1.0.1 they break. Just in case anyone is interested, this is due to change in 1.3.0 which changed return type of configuration method from 'void' to ObjectMapper, to allow fluent-style chaining of configuration. This is source compatible, but unfortunately binary incompatible change. On plus side, it is the only known such problem, which makes it easier to recognize. -+ Tatu +-
