That looks like a bug in org.apache.pig.piggybank.storage.avro.AvroStorage()
It is not using Hadoop's wildcard/glob matching.   You'll want to file a bug 
against that (it is part of piggybank, not Avro).

If the schema is being dynamically read from the files, it needs to use 
Hadoop's globbing to resolve one of the files in order to read it and inspect 
the schema, and not take the passed in string literally.

At this point, you'll need help from those that contributed that to piggybank.  
The pig JIRA related to it is:
https://issues.apache.org/jira/browse/PIG-1748
But that may not be the best place for this usability question.



On 1/24/11 6:14 PM, "felix gao" <[email protected]<mailto:[email protected]>> 
wrote:

Thanks for the info. I have not compiled a new version of pig and it works when 
I load a single avro file. But it failed when I do wildcard filename matching.
log_load = LOAD '/user/felix/avro/access_log.test.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();   <--- works fine
but
log_load = LOAD '/user/felix/avro/*.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

ERROR 1018: Problem determining schema during load

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. Problem determining schema during load
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1342)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1286)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:460)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:738)
    at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
    at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
    at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:414)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem 
determining schema during load
    at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:752)
    at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1336)
    ... 8 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: 
Problem determining schema during load
    at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:156)
    at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:750)
    ... 10 more
Caused by: java.io.FileNotFoundException: File does not exist: 
/user/felix/avro/*.avro
    at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
    at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:185)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:431)
    at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:181)
    at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:133)
    at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:108)
    at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:233)
    at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:169)
    at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:150)
    ... 11 more


How do I load multiple avro files with the load function.

Felix


On Mon, Jan 24, 2011 at 4:25 PM, Scott Carey 
<[email protected]<mailto:[email protected]>> wrote:
A jar prior to the jackson jar contains an earlier version of jackson inside of 
it.

Pig's jar typically contains all its dependencies (there is 
'pig-withouthadoop.jar' instead).

So my guess is that one of these (look at a listing of jar contents) has 
jackson in it:

/usr/lib/pig/bin/../pig-0.7.0+16-core.jar
/usr/lib/pig/bin/../pig-0.7.0+16.jar
/usr/lib/pig/bin/../build/pig-*-core.jar:
/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar


On 1/24/11 4:06 PM, "felix gao" <[email protected]<mailto:[email protected]>> 
wrote:

here is the actual process that is running the pig script, I hope this helps.

root     20820 19838 66 18:57 pts/0    00:00:00 /usr/java/default/bin/java 
-Xmx1000m -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
 -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dpig.root.logger=INFO,console,DRFA 
-classpath 
/usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/*
 org.apache.pig.Main avro.pig


pig -secretDebugCmd
dry run:
/usr/java/default/bin/java -Xmx1000m 
-Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
 -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log 
-Dpig.home.dir=/usr/lib/pig/bin/.. -Dpig.root.logger=INFO,console,DRFA 
-classpath 
/usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/*
 org.apache.pig.Main


seems the jackson 1.5.5 is on the classpath of the pig as well as tasktracker 
and the actual job.

Felix


On Mon, Jan 24, 2011 at 3:44 PM, Tatu Saloranta 
<[email protected]<mailto:[email protected]>> wrote:
On Mon, Jan 24, 2011 at 3:13 PM, Scott Carey 
<[email protected]<mailto:[email protected]>> wrote:
> That is confusing.  Can you capture the classpath of an actual task process,
> not just the TT?  They shouldn't differ much, but it is worth checking.
> Jackson 1.3 (or was it 1.2?) and above have all been backwards compatible
> with each other I believe.   And the error you are getting is definitely
> caused by accessing the enable() methods that were added after 1.0.1.
> I can change the Avro dependency on Jackson to 1.5.5, 1.7.1, or 1.3, and
> unit tests pass.  If I change it to 1.2, 1.1, or 1.0.1 they break.

Just in case anyone is interested, this is due to change in 1.3.0
which changed return type of configuration method from 'void' to
ObjectMapper, to allow fluent-style chaining of configuration. This is
source compatible, but unfortunately binary incompatible change. On
plus side, it is the only known such problem, which makes it easier to
recognize.

-+ Tatu +-


Reply via email to