Thanks Ashutosh, your suggestion helped. Actually, I am loading data using PigStorage, so my output <key, value> pair are declared as <NullableText, NullableTuple>.
By declaring my getOutputFormat() to return a SequenceFileOutputFormat<NullableText, NullableTuple>() I managed to make it work. The downside is that now I need to wrap my bytes in a Tuple and wrap the Tuple in a NullableTuple. Is this the intended way it should work? Why not let the user use any <WritableComparable, Writable> pair instead? It should be possible for Pig to use the classes defined by the user in the StoreFunc in order to define the OutputKeyClass and OutputValueClass. Cheers, -- Gianmarco On Fri, Oct 28, 2011 at 19:15, Ashutosh Chauhan <[email protected]>wrote: > Hey Gianmarco, > > How are you loading data in pig script? Using your own LoadFunc. Pig > declares following types to MR framework: > Map: > KeyIn: Text, ValueIn:Tuple > Reducer: > KeyOut: PigNullableWritable, ValueOut:Writable > > So, your loadfunc/storefunc key,value types must extend from these. > > Hope it helps, > Ashutosh > > On Fri, Oct 28, 2011 at 09:37, Gianmarco De Francisci Morales < > [email protected]> wrote: > > > Hi pig users, > > I implemented a custom StoreFunc to write some data in a binary format > to a > > Sequence File. > > > > private RecordWriter<NullWritable, BytesWritable> writer; > > > > private BytesWritable bytes; > > > > private DataOutputBuffer dob; > > > > > > @SuppressWarnings("rawtypes") > > > > @Override > > > > public OutputFormat getOutputFormat() throws IOException { > > > > return new SequenceFileOutputFormat<NullWritable, > BytesWritable>(); > > > > } > > > > > > @SuppressWarnings({ "rawtypes", "unchecked" }) > > > > @Override > > > > public void prepareToWrite(RecordWriter writer) throws IOException { > > > > this.writer = writer; > > > > this.bytes = new BytesWritable(); > > > > this.dob = new DataOutputBuffer(); > > > > } > > > > @Override > > > > public void putNext(Tuple tuple) throws IOException { > > > > dob.reset(); > > > > WritableUtils.writeCompressedString(dob, (String) tuple.get(0)); > > > > DataBag childTracesBag = (DataBag) tuple.get(1); > > > > WritableUtils.writeVLong(dob, childTracesBag.size()); > > > > for (Tuple t : childTracesBag) { > > > > WritableUtils.writeVInt(dob, (Integer) t.get(0)); > > > > dob.writeLong((Long) t.get(1)); > > > > } > > > > try { > > > > bytes.set(dob.getData(), 0, dob.getLength()); > > > > writer.write(NullWritable.get(), bytes); > > > > } catch (InterruptedException e) { > > > > e.printStackTrace(); > > > > } > > > > } > > > > > > But I get this exception: > > > > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to > > recreate exception from backed error: java.io.IOException: > > java.io.IOException: wrong key class: org.apache.hadoop.io.NullWritable > is > > not class org.apache.pig.impl.io.NullableText > > > > > > > > And if I use a NullableText instead of a NullWritable, I get this other > > exception: > > > > > > ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to > > recreate exception from backed error: java.io.IOException: > > java.io.IOException: wrong value class: > org.apache.hadoop.io.BytesWritable > > is not class org.apache.pig.impl.io.NullableTuple > > > > > > > > There must be something I am doing wrong in telling Pig the types of the > > sequence file. > > > > It must be a stupid problem but I don't see it. > > > > Does anybody have a clue? > > > > > > Thanks, > > -- > > Gianmarco > > >
