To answer my own question, this is because the schemas differ. The schema in the working case has a named tuple via AvroStorage. Storing to Mongo works when I name the tuple:
... sent_topics = FOREACH froms GENERATE FLATTEN(group) AS (from, to), pairs.subject AS pairs:bag {column:tuple (subject:chararray)}; STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING MongoStorage(); I will stop cross-posting to myself now. On Sun, Feb 5, 2012 at 12:47 AM, Russell Jurney <russell.jur...@gmail.com>wrote: > sent_topics = LOAD '/tmp/pair_titles.avro' USING AvroStorage(); > STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING > MongoStorage(); > > That works. Why is it the case that MongoStorage only works if the > intermediate processing doesn't happen? Strangeness. > > On Sun, Feb 5, 2012 at 12:31 AM, Russell Jurney > <russell.jur...@gmail.com>wrote: > >> MongoStorage is failing for me now, on a script that was failing before. >> Is anyone else using it? The schema is [from:chararray, to:chararray, >> pairs:{null:(subject:chararray)}], which worked before. >> >> 2012-02-05 00:27:54,991 [Thread-15] INFO >> com.mongodb.hadoop.pig.MongoStorage - Store Location Config: >> Configuration: core-default.xml, core-site.xml, mapred-default.xml, >> mapred-site.xml, >> /tmp/hadoop-rjurney/mapred/local/localRunner/job_local_0001.xml For URI: >> mongodb://localhost/test.pigola >> 2012-02-05 00:27:54,993 [Thread-15] INFO >> com.mongodb.hadoop.pig.MongoStorage - OutputFormat... >> com.mongodb.hadoop.MongoOutputFormat@4eb7cd92 >> 2012-02-05 00:27:55,291 [Thread-15] INFO >> com.mongodb.hadoop.pig.MongoStorage - Preparing to write to >> com.mongodb.hadoop.output.MongoRecordWriter@333ec758 >> Failed to parse: <line 1, column 35> rule identifier failed predicate: >> {!input.LT(1).getText().equalsIgnoreCase("NULL")}? >> at >> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:79) >> at >> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:93) >> at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:175) >> at org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:166) >> at >> com.mongodb.hadoop.pig.MongoStorage.prepareToWrite(MongoStorage.java:186) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86) >> at >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >> 2012-02-05 00:27:55,320 [Thread-15] INFO >> com.mongodb.hadoop.pig.MongoStorage - Stored Schema: [from:chararray, >> to:chararray, pairs:{null:(subject:chararray)}] >> 2012-02-05 00:27:55,323 [Thread-15] WARN >> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 >> java.io.IOException: java.lang.NullPointerException >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >> Caused by: java.lang.NullPointerException >> at com.mongodb.hadoop.pig.MongoStorage.putNext(MongoStorage.java:68) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) >> at >> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508) >> at >> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462) >> ... 7 more >> >> >> -- >> Russell Jurney >> twitter.com/rjurney >> russell.jur...@gmail.com >> datasyndrome.com >> > > > > -- > Russell Jurney > twitter.com/rjurney > russell.jur...@gmail.com > datasyndrome.com > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com