Here is the pig script (I hope the formatting is kept),
I think I could reduce the script to a simple load/store and still have the
same problem, but I didn't have time to check it (I would need to rewrite
the StoreFunc).
FYI, my StoreFunc tries to write a SequenceFile<NullWritable,
BytesWritable>:
@Override
public OutputFormat<NullWritable, BytesWritable> getOutputFormat()
throws IOException {
return new SequenceFileOutputFormat<NullWritable, BytesWritable>();
}
rawtraces = LOAD '$log' AS (follower:chararray, action:int, time:long);
groupedtraces = GROUP rawtraces BY follower;
traces = FOREACH groupedtraces GENERATE group AS performer,
rawtraces.(action, time) AS t;
rawsn = LOAD '$network' AS (parent:chararray, child:chararray);
groupedsn = GROUP rawsn BY parent;
sn = FOREACH groupedsn GENERATE group AS parent, rawsn.(child) AS children;
join1 = JOIN traces BY performer, sn BY parent;
cleanJ1 = FOREACH join1 GENERATE traces::performer AS parent,
traces::t ASparentTraces, FLATTEN(sn::children)
AS child;
groupedJ1 = GROUP cleanJ1 BY child;
intermediate = FOREACH groupedJ1 GENERATE group AS child, cleanJ1.(parent,
parentTraces) AS legacy;
join2 = JOIN traces BY performer, intermediate BY child;
result = FOREACH join2 GENERATE traces::performer AS child, traces::t
ASchildTraces, intermediate::legacy
AS legacy;
STORE result INTO '$output' USING mypackage.pig.BinStorage();
And here is the stack trace:
java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.NullWritable is not class
org.apache.pig.impl.io.NullableText
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985)
at
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
at mypackage.pig.BinStorage.putNext(BinStorage.java:75)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:462)
... 11 more
Cheers,
--
Gianmarco
On Tue, Nov 1, 2011 at 01:55, Ashutosh Chauhan <[email protected]> wrote:
> Actually what I said was not entirely correct. Per Daniel, Pig's load/store
> func are designed to work with InputFormat/OutputFormat which works on
> <ComparableWritable,Writable> so what you are seeing is not expected. Can
> you paste the pig script you are using and the detailed stack trace. You
> can find that in JobTracker log.
>
> Hope it helps,
> Ashutosh
>
> On Mon, Oct 31, 2011 at 04:28, Gianmarco De Francisci Morales <
> [email protected]> wrote:
>
> > Thanks Ashutosh,
> >
> > your suggestion helped.
> > Actually, I am loading data using PigStorage, so my output <key, value>
> > pair are declared as <NullableText, NullableTuple>.
> >
> > By declaring my getOutputFormat() to return
> > a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to
> make
> > it work.
> >
> > The downside is that now I need to wrap my bytes in a Tuple and wrap the
> > Tuple in a NullableTuple.
> > Is this the intended way it should work?
> > Why not let the user use any <WritableComparable, Writable> pair instead?
> > It should be possible for Pig to use the classes defined by the user in
> the
> > StoreFunc in order to define the OutputKeyClass and OutputValueClass.
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> > On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <[email protected]
> > >wrote:
> >
> > > Hey Gianmarco,
> > >
> > > How are you loading data in pig script? Using your own LoadFunc. Pig
> > > declares following types to MR framework:
> > > Map:
> > > KeyIn: Text, ValueIn:Tuple
> > > Reducer:
> > > KeyOut: PigNullableWritable, ValueOut:Writable
> > >
> > > So, your loadfunc/storefunc key,value types must extend from these.
> > >
> > > Hope it helps,
> > > Ashutosh
> > >
> > > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales <
> > > [email protected]> wrote:
> > >
> > > > Hi pig users,
> > > > I implemented a custom StoreFunc to write some data in a binary
> format
> > > to a
> > > > Sequence File.
> > > >
> > > > private RecordWriter<NullWritable, BytesWritable> writer;
> > > >
> > > > private BytesWritable bytes;
> > > >
> > > > private DataOutputBuffer dob;
> > > >
> > > >
> > > > @SuppressWarnings("rawtypes")
> > > >
> > > > @Override
> > > >
> > > > public OutputFormat getOutputFormat() throws IOException {
> > > >
> > > > return new SequenceFileOutputFormat<NullWritable,
> > > BytesWritable>();
> > > >
> > > > }
> > > >
> > > >
> > > > @SuppressWarnings({ "rawtypes", "unchecked" })
> > > >
> > > > @Override
> > > >
> > > > public void prepareToWrite(RecordWriter writer) throws
> IOException {
> > > >
> > > > this.writer = writer;
> > > >
> > > > this.bytes = new BytesWritable();
> > > >
> > > > this.dob = new DataOutputBuffer();
> > > >
> > > > }
> > > >
> > > > @Override
> > > >
> > > > public void putNext(Tuple tuple) throws IOException {
> > > >
> > > > dob.reset();
> > > >
> > > > WritableUtils.writeCompressedString(dob, (String)
> tuple.get(0));
> > > >
> > > > DataBag childTracesBag = (DataBag) tuple.get(1);
> > > >
> > > > WritableUtils.writeVLong(dob, childTracesBag.size());
> > > >
> > > > for (Tuple t : childTracesBag) {
> > > >
> > > > WritableUtils.writeVInt(dob, (Integer) t.get(0));
> > > >
> > > > dob.writeLong((Long) t.get(1));
> > > >
> > > > }
> > > >
> > > > try {
> > > >
> > > > bytes.set(dob.getData(), 0, dob.getLength());
> > > >
> > > > writer.write(NullWritable.get(), bytes);
> > > >
> > > > } catch (InterruptedException e) {
> > > >
> > > > e.printStackTrace();
> > > >
> > > > }
> > > >
> > > > }
> > > >
> > > >
> > > > But I get this exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong key class:
> org.apache.hadoop.io.NullWritable
> > > is
> > > > not class org.apache.pig.impl.io.NullableText
> > > >
> > > >
> > > >
> > > > And if I use a NullableText instead of a NullWritable, I get this
> other
> > > > exception:
> > > >
> > > >
> > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> > > > recreate exception from backed error: java.io.IOException:
> > > > java.io.IOException: wrong value class:
> > > org.apache.hadoop.io.BytesWritable
> > > > is not class org.apache.pig.impl.io.NullableTuple
> > > >
> > > >
> > > >
> > > > There must be something I am doing wrong in telling Pig the types of
> > the
> > > > sequence file.
> > > >
> > > > It must be a stupid problem but I don't see it.
> > > >
> > > > Does anybody have a clue?
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > Gianmarco
> > > >
> > >
> >
>