To answer my own question, this is because the schemas differ.  The schema
in the working case has a named tuple via AvroStorage.  Storing to Mongo
works when I name the tuple:

...
sent_topics = FOREACH froms GENERATE FLATTEN(group) AS (from, to),
pairs.subject AS pairs:bag {column:tuple (subject:chararray)};

STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING
MongoStorage();


I will stop cross-posting to myself now.

On Sun, Feb 5, 2012 at 12:47 AM, Russell Jurney <russell.jur...@gmail.com>wrote:

> sent_topics = LOAD '/tmp/pair_titles.avro' USING AvroStorage();
> STORE sent_topics INTO 'mongodb://localhost/test.pigola' USING
> MongoStorage();
>
> That works.  Why is it the case that MongoStorage only works if the
> intermediate processing doesn't happen?  Strangeness.
>
> On Sun, Feb 5, 2012 at 12:31 AM, Russell Jurney 
> <russell.jur...@gmail.com>wrote:
>
>> MongoStorage is failing for me now, on a script that was failing before.
>>  Is anyone else using it? The schema is [from:chararray, to:chararray,
>> pairs:{null:(subject:chararray)}], which worked before.
>>
>> 2012-02-05 00:27:54,991 [Thread-15] INFO
>>  com.mongodb.hadoop.pig.MongoStorage - Store Location Config:
>> Configuration: core-default.xml, core-site.xml, mapred-default.xml,
>> mapred-site.xml,
>> /tmp/hadoop-rjurney/mapred/local/localRunner/job_local_0001.xml For URI:
>> mongodb://localhost/test.pigola
>> 2012-02-05 00:27:54,993 [Thread-15] INFO
>>  com.mongodb.hadoop.pig.MongoStorage - OutputFormat...
>> com.mongodb.hadoop.MongoOutputFormat@4eb7cd92
>> 2012-02-05 00:27:55,291 [Thread-15] INFO
>>  com.mongodb.hadoop.pig.MongoStorage - Preparing to write to
>> com.mongodb.hadoop.output.MongoRecordWriter@333ec758
>> Failed to parse: <line 1, column 35>  rule identifier failed predicate:
>> {!input.LT(1).getText().equalsIgnoreCase("NULL")}?
>> at
>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:79)
>>  at
>> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:93)
>> at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:175)
>>  at org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:166)
>> at
>> com.mongodb.hadoop.pig.MongoStorage.prepareToWrite(MongoStorage.java:186)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
>>  at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>  at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>> 2012-02-05 00:27:55,320 [Thread-15] INFO
>>  com.mongodb.hadoop.pig.MongoStorage - Stored Schema: [from:chararray,
>> to:chararray, pairs:{null:(subject:chararray)}]
>> 2012-02-05 00:27:55,323 [Thread-15] WARN
>>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>> java.io.IOException: java.lang.NullPointerException
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>> Caused by: java.lang.NullPointerException
>> at com.mongodb.hadoop.pig.MongoStorage.putNext(MongoStorage.java:68)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>  at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> at
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
>>  at
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
>>  ... 7 more
>>
>>
>> --
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jur...@gmail.com
>> datasyndrome.com
>>
>
>
>
> --
> Russell Jurney
> twitter.com/rjurney
> russell.jur...@gmail.com
> datasyndrome.com
>



-- 
Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com

Reply via email to