You can perhaps consider using the experimental JavaSerialization [1] enhancement to skip transforming to Writables/other-serialization-formats. It may be slower but looks like you are looking for a way to avoid transforming objects.
Enable by adding the class org.apache.hadoop.io.serializer.JavaSerialization to the list of io.serializations like so in your client configuration: <property> <name>io.serializations</name> <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value> </property> And you should then be able to rely on Java's inbuilt serialization to directly serialize your BitSet object? [1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <[email protected]> wrote: > I have large java.util.BitSet objects that I want to bitwise-OR using a > MapReduce job. I decided to wrap around each object using the Writable > interface. Right now I convert each BitSet to a byte array and serialize the > byte array on disk. > > Converting them to byte arrays is a bit inefficient but I could not find a > work around to write them directly to the DataOutput. Is there a way to skip > this and serialize the object directly? Here is what my current > implementation looks like: > > public class BitSetWritable implements Writable { > > private BitSet bs; > > public BitSetWritable() { > this.bs = new BitSet(); > } > > @Override > public void write(DataOutput out) throws IOException { > > ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8); > ObjectOutputStream oos = new ObjectOutputStream(bos); > oos.writeObject(bs); > byte[] bytes = bos.toByteArray(); > oos.close(); > out.writeInt(bytes.length); > out.write(bytes); > > } > > @Override > public void readFields(DataInput in) throws IOException { > > int len = in.readInt(); > byte[] bytes = new byte[len]; > in.readFully(bytes); > > ByteArrayInputStream bis = new ByteArrayInputStream(bytes); > ObjectInputStream ois = new ObjectInputStream(bis); > try { > bs = (BitSet) ois.readObject(); > } catch (ClassNotFoundException e) { > throw new IOException(e); > } > > ois.close(); > } > > } -- Harsh J
