I have large java.util.BitSet objects that I want to bitwise-OR using a
MapReduce job. I decided to wrap around each object using the Writable
interface. Right now I convert each BitSet to a byte array and serialize
the byte array on disk.
Converting them to byte arrays is a bit inefficient but I could not find a
work around to write them directly to the DataOutput. Is there a way to
skip this and serialize the object directly? Here is what my current
implementation looks like:
public class BitSetWritable implements Writable {
private BitSet bs;
public BitSetWritable() {
this.bs = new BitSet();
}
@Override
public void write(DataOutput out) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(bs);
byte[] bytes = bos.toByteArray();
oos.close();
out.writeInt(bytes.length);
out.write(bytes);
}
@Override
public void readFields(DataInput in) throws IOException {
int len = in.readInt();
byte[] bytes = new byte[len];
in.readFully(bytes);
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
ObjectInputStream ois = new ObjectInputStream(bis);
try {
bs = (BitSet) ois.readObject();
} catch (ClassNotFoundException e) {
throw new IOException(e);
}
ois.close();
}
}