Here is the pig script (I hope the formatting is kept),
I think I could reduce the script to a simple load/store and still have the
same problem, but I didn't have time to check it (I would need to rewrite
the StoreFunc).
FYI, my StoreFunc tries to write a SequenceFile<NullWritable,
BytesWritable>:

    @Override

    public OutputFormat<NullWritable, BytesWritable> getOutputFormat()
throws IOException {

        return new SequenceFileOutputFormat<NullWritable, BytesWritable>();

    }


rawtraces = LOAD '$log' AS (follower:chararray, action:int, time:long);

groupedtraces = GROUP rawtraces BY follower;

traces = FOREACH groupedtraces GENERATE group AS performer,
rawtraces.(action, time) AS t;


rawsn = LOAD '$network' AS (parent:chararray, child:chararray);

groupedsn = GROUP rawsn BY parent;

sn = FOREACH groupedsn GENERATE group AS parent, rawsn.(child) AS children;


join1 = JOIN traces BY performer, sn BY parent;


cleanJ1 = FOREACH join1 GENERATE traces::performer AS parent,
traces::t ASparentTraces, FLATTEN(sn::children)
AS child;

groupedJ1 = GROUP cleanJ1 BY child;

intermediate = FOREACH groupedJ1 GENERATE group AS child, cleanJ1.(parent,
parentTraces) AS legacy;


join2 = JOIN traces BY performer, intermediate BY child;

result = FOREACH join2 GENERATE traces::performer AS child, traces::t
ASchildTraces, intermediate::legacy
AS legacy;


STORE result INTO '$output' USING mypackage.pig.BinStorage();

And here is the stack trace:

java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
        at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985)
        at 
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
        at mypackage.pig.BinStorage.putNext(BinStorage.java:75)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
        at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
        ... 11 more


Cheers,
--
Gianmarco



On Tue, Nov 1, 2011 at 01:55, Ashutosh Chauhan <[email protected]> wrote:

> Actually what I said was not entirely correct. Per Daniel, Pig's load/store
> func are designed to work with InputFormat/OutputFormat which works on
> <ComparableWritable,Writable> so what you are seeing is not expected. Can
> you paste the pig script you are using and the detailed stack trace. You
> can find that in JobTracker log.
>
> Hope it helps,
> Ashutosh
>
> On Mon, Oct 31, 2011 at 04:28, Gianmarco De Francisci Morales <
> [email protected]> wrote:
>
> > Thanks Ashutosh,
> >
> > your suggestion helped.
> > Actually, I am loading data using PigStorage, so my output <key, value>
> > pair are declared as <NullableText, NullableTuple>.
> >
> > By declaring my getOutputFormat() to return
> > a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to
> make
> > it work.
> >
> > The downside is that now I need to wrap my bytes in a Tuple and wrap the
> > Tuple in a NullableTuple.
> > Is this the intended way it should work?
> > Why not let the user use any <WritableComparable, Writable> pair instead?
> > It should be possible for Pig to use the classes defined by the user in
> the
> > StoreFunc in order to define the OutputKeyClass and OutputValueClass.
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> > On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <[email protected]
> > >wrote:
> >
> > > Hey Gianmarco,
> > >
> > > How are you loading data in pig script? Using your own LoadFunc. Pig
> > > declares following types to MR framework:
> > > Map:
> > >  KeyIn: Text, ValueIn:Tuple
> > >  Reducer:
> > >  KeyOut: PigNullableWritable, ValueOut:Writable
> > >
> > > So, your loadfunc/storefunc key,value types must extend from these.
> > >
> > > Hope it helps,
> > > Ashutosh
> > >
> > > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> > > [email protected]> wrote:
> > >
> > > > Hi pig users,
> > > > I implemented a custom StoreFunc to write some data in a binary
> format
> > > to a
> > > > Sequence File.
> > > >
> > > >    private RecordWriter<NullWritable, BytesWritable> writer;
> > > >
> > > >    private BytesWritable bytes;
> > > >
> > > >    private DataOutputBuffer dob;
> > > >
> > > >
> > > >    @SuppressWarnings("rawtypes")
> > > >
> > > >    @Override
> > > >
> > > >    public OutputFormat getOutputFormat() throws IOException {
> > > >
> > > >        return new SequenceFileOutputFormat<NullWritable,
> > > BytesWritable>();
> > > >
> > > >    }
> > > >
> > > >
> > > >    @SuppressWarnings({ "rawtypes", "unchecked" })
> > > >
> > > >    @Override
> > > >
> > > >    public void prepareToWrite(RecordWriter writer) throws
> IOException {
> > > >
> > > >        this.writer = writer;
> > > >
> > > >        this.bytes = new BytesWritable();
> > > >
> > > >        this.dob = new DataOutputBuffer();
> > > >
> > > >    }
> > > >
> > > >    @Override
> > > >
> > > >    public void putNext(Tuple tuple) throws IOException {
> > > >
> > > >        dob.reset();
> > > >
> > > >        WritableUtils.writeCompressedString(dob, (String)
> tuple.get(0));
> > > >
> > > >        DataBag childTracesBag = (DataBag) tuple.get(1);
> > > >
> > > >        WritableUtils.writeVLong(dob, childTracesBag.size());
> > > >
> > > >        for (Tuple t : childTracesBag) {
> > > >
> > > >            WritableUtils.writeVInt(dob, (Integer) t.get(0));
> > > >
> > > >            dob.writeLong((Long) t.get(1));
> > > >
> > > >        }
> > > >
> > > >        try {
> > > >
> > > >            bytes.set(dob.getData(), 0, dob.getLength());
> > > >
> > > >            writer.write(NullWritable.get(), bytes);
> > > >
> > > >        } catch (InterruptedException e) {
> > > >
> > > >            e.printStackTrace();
> > > >
> > > >        }
> > > >
> > > >    }
> > > >
> > > >
> > > > But I get this exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong key class:
> org.apache.hadoop.io.NullWritable
> > > is
> > > > not class org.apache.pig.impl.io.NullableText
> > > >
> > > >
> > > >
> > > > And if I use a NullableText instead of a NullWritable, I get this
> other
> > > > exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong value class:
> > > org.apache.hadoop.io.BytesWritable
> > > > is not class org.apache.pig.impl.io.NullableTuple
> > > >
> > > >
> > > >
> > > > There must be something I am doing wrong in telling Pig the types of
> > the
> > > > sequence file.
> > > >
> > > > It must be a stupid problem but I don't see it.
> > > >
> > > > Does anybody have a clue?
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > Gianmarco
> > > >
> > >
> >
>

Reply via email to