Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make the code more readable. I will take a look at the EWAH implementation as well.
Jim On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <[email protected]>wrote: > You can disregard my links as their are only valid for java 1.7+. > The JavaSerialization might clean your code but shouldn't bring a > significant boost in performance. > The EWAH implementation has, at least, the methods you are looking for : > serialize / deserialize. > > Regards > > Bertrand > > Note to myself : I have to remember this one. > > > On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <[email protected]>wrote: > >> Another interesting alternative is the EWAH implementation of java >> bitsets that allow efficient compressed bitsets with very fast OR >> operations. >> >> https://github.com/lemire/javaewah >> >> See also https://code.google.com/p/sparsebitmap/ by the same authors. >> >> >> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <[email protected]>wrote: >> >>> In order to make the code more readable, you could start by using the >>> methods toByteArray() and valueOf(bytes) >>> >>> >>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29 >>> >>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 >>> >>> Regards >>> >>> Bertrand >>> >>> >>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <[email protected]>wrote: >>> >>>> I have large java.util.BitSet objects that I want to bitwise-OR using a >>>> MapReduce job. I decided to wrap around each object using the Writable >>>> interface. Right now I convert each BitSet to a byte array and serialize >>>> the byte array on disk. >>>> >>>> Converting them to byte arrays is a bit inefficient but I could not >>>> find a work around to write them directly to the DataOutput. Is there a way >>>> to skip this and serialize the object directly? Here is what my current >>>> implementation looks like: >>>> >>>> public class BitSetWritable implements Writable { >>>> >>>> private BitSet bs; >>>> >>>> public BitSetWritable() { >>>> this.bs = new BitSet(); >>>> } >>>> >>>> @Override >>>> public void write(DataOutput out) throws IOException { >>>> >>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8); >>>> ObjectOutputStream oos = new ObjectOutputStream(bos); >>>> oos.writeObject(bs); >>>> byte[] bytes = bos.toByteArray(); >>>> oos.close(); >>>> out.writeInt(bytes.length); >>>> out.write(bytes); >>>> >>>> } >>>> >>>> @Override >>>> public void readFields(DataInput in) throws IOException { >>>> >>>> int len = in.readInt(); >>>> byte[] bytes = new byte[len]; >>>> in.readFully(bytes); >>>> >>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes); >>>> ObjectInputStream ois = new ObjectInputStream(bis); >>>> try { >>>> bs = (BitSet) ois.readObject(); >>>> } catch (ClassNotFoundException e) { >>>> throw new IOException(e); >>>> } >>>> >>>> ois.close(); >>>> } >>>> >>>> } >>>> >>> >>> >>> >>> -- >>> Bertrand Dechoux >>> >> >> > > > -- > Bertrand Dechoux >
